Quantum Tensor Product Decomposition from Choi State Tomography

The Schmidt decomposition is the go-to tool for measuring bipartite entanglement of pure quantum states. Similarly, it is possible to study the entangling features of a quantum operation using its operator-Schmidt, or tensor product decomposition. While quantum technological implementations of the former are thoroughly studied, entangling properties on the operator level are harder to extract in the quantum computational framework because of the exponential nature of sample complexity. Here we present an algorithm for unbalanced partitions into a small subsystem and a large one (the environment) to compute the tensor product decomposition of a unitary whose effect on the small subsystem is captured in classical memory while the effect on the environment is accessible as a quantum resource. This quantum algorithm may be used to make predictions about operator non-locality, effective open quantum dynamics on a subsystem, as well as for finding low-rank approximations and low-depth compilations of quantum circuit unitaries. We demonstrate the method and its applications on a time-evolution unitary of an isotropic Heisenberg model in two dimensions.


I. INTRODUCTION
Entanglement is a defining feature of quantum theory [1].The powerful capability of sharing information in a superposition of coupled states within a composite system still fascinates and puzzles physicists even a hundred years after the advent of quantum physics.Entanglement is used as a fundamental resource for quantum computing and has launched an entirely new paradigm for information processing [2].
For a fixed Hilbert space partition, H ∼ = H A ⊗ H B , the Schmidt decomposition of a pure state, |ψ⟩, into tensor products, |ψ⟩ = r k=1 σ k |a k ⟩ ⊗ |b k ⟩, reveals features about the shared entanglement between the two subsystems, H A and H B .A disentangled state, or tensor product state, will consist of a single non-zero term, while an entangled state will have a Schmidt rank r > 1. Analogously, we can define the tensor product decomposition (TPD) [3] or operator-Schmidt decomposition [4] of a unitary operator, U , via Here, A k ∈ L(H A ), B k ∈ L(H B ) are linear operators acting on the subsystems H A/B and the rank, R, is the minimal number of non-zero terms in the TPD.Without loss of generality, we can impose the A k and B k to be orthogonal with respect to the Hilbert-Schmidt inner product, and normalized to ∥A k ∥ 2 = d A := dim(H A ) and ∥B k ∥ 2 = d B := dim(H B ) using the 2-norm ∥.∥.Note that * Refik.Mansuroglu@fau.de A k and B k are not unitary, in general.Furthermore, the s k are non-negative, real numbers which are constrained to sum up to one, by unitarity of U , i.e.R k s 2 k = 1 (see App.A for details).
As a theoretical tool, TPD has previously been used to classify non-local and entangling content of unitaries [5][6][7], and also for the analysis of time evolution of quantum many body systems [8] and quantification of quantum chaos [9,10].Recently, TPD has been used to construct entanglement witnesses [11].While these advances motivate a systematic method to obtain the TPD of a quantum operator, current approaches are limited to classical resources.
Van Loan and Pitsianis [3,12] developed a classical algorithm to find the TPD of an operator T , not necessarily unitary, using the singular value decomposition of a reordered version of T .In quantum information processing, the operator of interest will typically require classical memory that grows exponentially in the number of qubits, making such classical methods inaccessible.While the measurement of Schmidt decompositions on quantum states has already been thoroughly studied [13][14][15], works on the operator level remain limited to specific problems that can be treated analytically [5,7,16].
Here we bring the tensor product decomposition into a quantum algorithmic framework.In particular, we present a hybrid quantum-classical algorithm that performs the quantum tensor product decomposition (QTPD) described in Eq. (1) for a unitary matrix U with a known quantum circuit representation.If we assume an asymmetric split for which d A ≪ d B , QTPD provides the operators A k in classical memory, whereas B k are accessed as a quantum resource distilled out of U .The complete algorithm is visualized in Fig. 1.
We discuss a number of immediate applications and

Classical Processing
Figure 1.Summary of QTPD.Given a unitary operation U as a quantum resource, it is decomposed into the form of Eq. ( 1) in two steps.First, the Choi-state of U is prepared and tomography is performed on the subsystem HA.The classical snapshot ρT of the state ρ of subsystem HA is then classically diagonalized to obtain the tensors A k and the Schmidt values s k (cf.Eq. ( 3)).With this classical information, the non-locality measure SA(U ) introduced in Eq. ( 17) can be calculated.The action of U on the environment via B k is consequently obtained by a measurement of the observable P k from Eq. ( 4) on the subsystem HA.The green boxes above denote fully classical steps, and the blue boxes denote steps where a quantum computer is used.
new directions for future research that are enabled with QTPD.Alongside low-rank approximations, QTPD provides a tool for studying entanglement, with application to entanglement witnesses and measures of entanglement generation [6,11], classically assisted simulation of (open) quantum dynamics [17][18][19] and low-depth compilation techniques [20].The fact that the A k are stored classically goes hand in hand with the philosophy of hybrid quantum computing, which is to use quantum resources as little as possible, but in the most crucial step.We note, however, that QTPD collects all necessary data from the quantum computer at the start of the algorithm, and does not require a hybrid quantum-classical optimization loop [21].We demonstrate QTPD and its applications on the time evolution operator of an isotropic Heisenberg model.

II. QUANTUM TENSOR PRODUCT DECOMPOSITION
We introduce QTPD in two steps.First, a matrix representation of the action on H A is captured via tomography and second, we discuss a distillation technique to subsequently capture the action on H B as a quantum channel.We finally compare required resources for QTPD against its classical competitor, discuss possible adaptations for the near term and comment on error propagation.

A. The Algorithm
We start with a unitary operator U that is accessible as a quantum resource (i.e. it is accessible as an oracle or its circuit representation is known).Our aim is to get a classical snapshot of the reduced action of U on H A , which is implicit in the operators A k in Eq. (1).Consider the action of U on the two generalized Bell states that are states on two copies of the subsystems H A and H B , respectively (cf.first circuit in Fig. 1).
which can be derived using orthonormality of the A k and B k (see App. B 1). Unvectorizing the vec(A k ) is exponentially hard, in general [22].Since d A is assumed to be much smaller than d B , a tomography of the state ρ A (U ) can be taken and stored as a classical snapshot.Diagonalizing ρ A (U ) finally yields the eigenvectors vec(A k ) with corresponding eigenvalues s 2 k .Note that this is mathematically equivalent to finding one half (on H A ) of the Schmidt decomposition of the Choi state of U .
The above algorithm not only yields information about s k and A k , but we can also find B k as a quantum resource.Once the A k are found, one can distill out the individual B k .That is, given a state |ψ⟩ one can prepare B k |ψ⟩.This is done via a partial measurement of the projector on the state U |Φ + A ⟩ ⊗ |ψ⟩ (cf.second circuit in Fig. 1 and see App.B 2 for a derivation).The projectors, P k , are orthogonal, i.e.P k P l = δ kl P k , which is a direct consequence of the orthonormality of the A k .As a result, they can be simultaneously measured, such that every shot of the distillation circuit (cf.Fig. 1) yields the normalized output state Note that R k=1 p k = 1 from unitarity of U , see App.B 2 for details.
If one is interested in the action of a specific B k only, the distillation process involves post-selection on the outcome of the partial measurement.This comes with a sample overhead of O 1 which is never a serious issue.The overhead becomes large when either ∥B k |ψ⟩∥ or s k become small.In the first case, the output state is close to the zero vector and in the second case, the sampled tensor component is a small contribution in a low-rank approximation of U .
QTPD can be used to determine such approximations to U of a specified rank.A set of operators {C k } and {D k }, such that the 2-norm is minimal, is called a rank r approximation.We introduced the positive, real-valued scalars t k following the same convention as in the tensor product decomposition of U .The special case r = 1 from Eq. ( 5) corresponds to the well-known nearest Kronecker problem [3,12].The solution to minimize Eq. ( 5) is the sum of product operators, As the s k and the A k are classically stored, but the B k are not, we cannot classically store a low rank approximation.A low rank approximation can be used to suppress the sample complexity of QTPD whenever the sample budget is limited.This allows us to resolve just the singular values, s k , that are sufficiently large and still provide a good approximation of U .In particular, if ε is the tolerable error of resolving the largest eigenvalues of ρ A (U ), then the sample complexity scales as O R . Hence, we achieve an ε-close approximation to ρ A (U ) in the operator norm by dropping every s 2 k < ε.A low-rank approximation, in which the B k are accessible on a quantum computer, as described above, can be seen as an application of QTPD.It differs from the applications that are discussed in section III, as it is always applied along with tomography.

B. Resources for QTPD
QTPD has the obvious advantage over classical methods that the unitary U can be loaded as a quantum circuit.Since the Bell pair creation can be executed in depth 2, the depth of QTPD is primarily given by the depth of the circuit U .The runtime of QTPD is hence dominated by the sample complexity of tomography that is in this case Õ R In memory, QTPD comes with a linear overhead of n ancilla qubits and requires classical memory to store O Rd 2 A complex numbers in the worst case.
The B-distillation step is similar in depth and admits a smaller sample complexity O s −2 k0 with s k0 the smallest coefficient to be resolved.It also only needs n A ancilla qubits and no additional classical memory.
A classical method to find the TPD of U goes back to van Loan and Pitsianis [3], who showed that the operator encodes the tensor factors in its singular value decomposition.The tensor factors A k and B k can be derived as the unvectorized left-and right-eigenvectors of R(U ) and the s k are its singular values.If we neglect the runtime for reshaping U into R(U ), the bottleneck of van Loan and Pitsianis' algorithm comes from the numerical solution for the singular value decomposition, which is 24,25].The factor R can be further weakened by randomization techniques [25].
This together with the required classical memory to store the d 2 complex elements of U make the classical algorithm uncompetitive for large dimensions d.To be precise, QTPD achieves a superpolynomial speedup of a factor Õ d 2 B ε 2 and memory savings by a factor of R , which both grow exponentially in system size.If U is given as a sparse matrix, Krylov subspace methods can be employed, which lower the required memory, but not the runtime [25].

C. Circumvention of Doubling the System Size
Next to the inevitable scaling in subspace dimension [22], the largest challenge within QTPD in the near term will be to keep the entangled Bell pairs coherent until measurement.Further, small quantum processors with fast read-out and long coherence times are most efficient when memory requirements are traded off against runtime.
To this end, the effect of the Bell pairs |Φ + A/B ⟩ is reduced to an average over a basis of H A and H B , respectively.In every run, a random initial state drawn from a basis of choice (for instance the computational basis) is fixed.The output state for a fixed basis state Averaging over the basis in H B accounts for an effective partial trace, using where the expectation value is taken over the discrete set of basis states in H B .After tracing out H B , we are thus left with If we keep the input-output relation within H A , the vectorization of the A k can be reconstructed using tomography and summing over all basis states in Recall the definition of the normalized vectorization above Eq.( 3).As a result, the same state as in Eq. ( 3), coming from the parallelized version using the Choi-state, can be reconstructed in classical post-processing.As opposed to the Choi-state based version, however, the runtime of this sequential version of QTPD is increased by the repeated tomography for a fixed basis state |j⟩ ∈ H A , yielding an overhead factor of d A in runtime.Convergence to the mean value from averaging over H B , on the other hand, does not introduce an additional sample overhead, as it is equivalent to tracing out H B as part of the Choi-state-based approach.

D. Error Analysis
In general, the error from shot noise in tomography will be operator-valued and can be viewed as the difference between the correct state, ρ A (U ), and the classical snapshot, ρ A (U ) (T ) , i.e. ε (T ) = ρ A (U ) − ρ A (U ) (T ) .The shot noise error, ε (T ) , propagates through QTPD and thus introduces errors to the s k , A k and B k .We discuss the technical details in App.D 1 and briefly present the results here.In order to understand how errors propagate, we consider the errors of the eigenvalues and eigenvectors of ρ A (U ) (T ) separately Our aim is to express the errors of s k and A k via ε = max ε (D) , ε (V ) .To this end, we define the error measures ε k .With these measures, we can trivially relate where the factor d A can be removed by appropriate normalization of the 2-norm in order to represent an average case error.Less trivially, but after a straightforward calculation (see App. D 1), we can also relate and with this finally We can show that ε (T ) ≤ 3ε, which completes the propagation from tomography error to the tensor factors A k .Applying the faulty A (T ) k for distillation of the effective action B (T ) k as a quantum resource imposes further error propagation on the projector P k , defined in Eq. ( 4).Taking into account appropriate (faulty) normalization factors, we can also bound the error of B (T ) k in the following way Altogether we show that the error contributions in s k , A k and B k are linear in the 2-norm ε (T ) .

III. APPLICATIONS
The quantum tensor product decomposition allows us to find and store the tensor components A k in classical resources and B k in a quantum resource.With this, we can solve a number tasks of interest.

A. Non-locality
One application of QTPD lies in measuring the nonlocality of the action of U , also called operator entanglement entropy [8][9][10], which further bounds how much entanglement U generates.The vectorization of U allows for a mapping of operators to quantum states.On this space, we can employ entanglement entropy measures that are defined for states.Consider the vectorized operator If we trace out the B-subsystem, we get exactly the mixed state of Eq. ( 3).The von Neumann entanglement entropy of this state reads and is a measure of the non-locality of the action of U .The non-locality S A (U ), sometimes referred to as the Schmidt strength [7,16], can be determined classically after a successful QTPD and admits a linear contribution from the tomography error to leading order (cf.App.D 3).Note that, although the non-locality of product operations vanishes, S A (A ⊗ B) = 0, Eq. ( 17) alone is not a good measure of entanglement generation.For instance the swap gate, which maps product states to product states, reaches the maximal value for Eq.(17).To measure entanglement generation, one considers the entangling power of a circuit [6,26,27].Several measures for entangling power have been proposed, of which we discuss two in App.E.

B. Mereology
If U is generated by a physical Hamiltonian, one might be interested in searching for a bipartite factorization (sometimes referred to as the tensor product structure) of the global Hilbert space such that the two subsytems are decoupled, i.e.U = U A ⊗ U B .Concretely, consider a Hamiltonian describing two interacting subsystems, , where H A i and H B i are operators acting on states describing subsystems A and B in a Hilbert space factorized as H = H A ⊗ H B .Since this factorization is essentially a particular choice of a global basis, it can be related to another one by a unitary [28], where V is a (non-local) unitary and states in A ′ and B ′ describe physically different subsystems than A and B. In particular, it may be possible to find a factorization of H such that the Hamiltonian is decoupled, i.e.V HV † = H A ′ + H B ′ .Some have used this approach, with the goal of minimizing the interaction Hamiltonian between two subsystems, to understand the emergence of classicality [29].Practically, this approach appears in cases where taking a certain transformation can lead to analytically tractable forms of the Hamiltonian, e.g. in the case of the Jordan-Wigner transformation that transforms certain interacting qubit Hamiltonians to a set of free fermionic operators.While QTPD does not itself find the optimal basis V that will lead to approximately decoupled dynamics, it can be used to evaluate the cost function as part of another algorithm (such as the one proposed in [30]).Two candidates to minimize are R k=2 s k or 1 − s 2 1 where the s k are the singular values in the tensor product decomposition of the time propagator The existence of such a decoherence-free split [31][32][33] is tightly connected to spectral properties of U , see App.F 1 for an example and App.F 2 for a necessary and sufficient condition.

C. Fast Quantum Transform and Classical Simulability
A mereology algorithm can be further utilized to find an efficient compilation of a target unitary U .If there exists a basis in which U decouples, U = V † (U A ⊗ U B )V can be implemented with a single layer after rotating into the basis V .A divide-and-conquer approach successively reduces the action of U into a tensor product of M local gates, i.e.
. Such a fast quantum transform requires a rotation into the basis V , which is entangling, in general.
More generally, for an arbitrary U , the closest fast quantum transform can be found via iterative QTPD, which can be performed efficiently if there is a single dominant coefficient s 1...1 in the multi-partite factorization We used the letter A for all operators to emphasize that the local dimensions are small enough to be classically simulated.
We can use the nearest unitary representations U m of the tensor factors A (m) 1 of the rank one approximation of U to construct a fast quantum transform approximation to U .We show that the error is O( √ 1 − s 1...1 ).For this, we separate the error of the fast quantum transform into two terms.One representing the error from truncating all terms from U except s 1...1 and one capturing the error by nearest unitary approximation jm allows for estimating the necessary bond dimensions Dm, which can be deduced by a low rank approximation of the multipartite decomposition (cf.Eq. ( 18)).
see App.C 2 for details.If there is not only one, but a polynomial number of dominant coefficients, a better approximation to the action of U can be achieved through a rank r approximation.In this case, the probability to sample from the dominant B k is suppressed by a polynomial factor.The simulation of a fast quantum transform, , is not only classically efficient, it is also made up of low-entangling transformations.Instead of achieving a close unitary approximation, the goal here is to bound the bond dimensions necessary for a faithful tensor network representation.
Since the non-locality measure S A (U ) bounds the entanglement generation, it can be used as a witness to scan for clusterings of the Hilbert space with low entanglement between the clusters.If a cluster H A (1) , on which the action of U is low-entangling, is found, the entanglement generation between a second cluster H A (2) and its environment H A (3) ⊗ ...H A (M ) will be bounded (cf.Fig. 2), as well.If the non-locality of the full unitary U is bounded, it can thus be written as a matrix product operator [34], of which the fast quantum transform is an extremal case.This allows for an efficient classical representation of the output of U , for instance via matrix product states [35] or projected entangled pair states [36].
Fast quantum transforms have conceptual similarities to entanglement forging [37], which is used to simulate a larger system by simulating the subsystems separately on a smaller quantum chip, if there are only few connecting gates in the compilation of U .As opposed to QTPD, these methods are typically concerned with symmetric splittings and aim for a reduction of quantum resources in the simulation of U instead of finding classically simulable subsystems.

D. Open Quantum dynamics
QTPD is applicable to studying entangling dynamics, or the decoherence of subsystem A into subsystem B. If we start with a product state |ψ A ⟩ ⊗ |ψ B ⟩, the evolved state within subsystem A will be mixed.The effective open quantum dynamics can be written in the form While the operators A k and the state |ψ A ⟩ can be stored on a classical machine, the λ kl are not accessible, a priori.Instead, the overlaps ⟨ψ B |B † l B k |ψ B ⟩ have to be determined using modified Hadamard tests or swap tests [4,38] with different outputs of the B k distillation via projective measurement of Eq. ( 4).
Once the λ kl and A k are stored classically, it is possible to simulate the open dynamics via Eq.20 for any initial state |ψ A ⟩. Eq. ( 20) can be transformed into its Kraus representation, for instance by diagonalizing the Choi matrix.In this manner, QTPD can be used as a quantum-enhanced classical simulation algorithm [21] for open system simulation.That is, quantum hardware is crucial to obtaining the A k but then Eq. 20 acts as a classical surrogate to simulate the dynamics of any initial state and observable.Note the parallels to process learning [39,40], which aim for learning a quantum channel from measurements of local observables using classical resources, such as neural networks.
The error in predicting observables on σ A can be bounded by the trace norm to the faulty prediction σ (T ) A from tomography, which scales linearly with the tomography error ε (T ) as we show in App.D 3. A naïve quantum simulation with fixed initial state and fixed observable suffers from shot noise that has the same scaling in samples as ε (T ) .

IV. NUMERICAL EXPERIMENT
We demonstrate QTPD on a Hamiltonian simulation problem for the isotropic Heisenberg model.To this end, we numerically solve the tensor decomposition by exact diagonalization of the unitary time evolution generated

by the Hamiltonian
with the Pauli matrices {X, Y, Z} and the sum over nearest neighbors denoted by ⟨i, j⟩.We discuss one system of toy size whose dynamics we can analytically solve and a separate system on a two-dimensional grid that is small enough to be checked by exact diagonalization.
Let us consider a model of two qubits first.We show in App.H that the time evolution operator generated by the Heisenberg Hamiltonian incorporates an oscillation between the identity and the swap operator for certain times.In between those times, entanglement is alternately generated and reduced.It is thus natural to view each qubit as a subsystem and study the open dynamics on one of the two qubits.Larger systems can be split in two and the swapping of excitations from H A to H B and vice versa can be studied, as well.
When more than two qubits are exchanging excitations, the overall dynamics are an ensemble of interfering oscillations, which depend on the initial state and the geometry of the interaction graph.To study both the distribution of excitations, and the entanglement between the split into subsystems H A and H B , we stroboscopically measure the total magnetization M (σ A (t)) of subsystem H A , as well as the von-Neumann entanglement entropy of the output state S(σ A (t)) with I A denoting the index set for qubits in subsystem H A .For the sake of a clear comparison, we normalize all observables to take values between 0 and 1.This means we divide state entanglement entropies by log(d) and operator non-localities by log(d 2 ) with d being the dimension of the Hilbert space.The magnetization is transformed into an occupation number M A → 1 2 − M A 2n A , with n A denoting the number of qubits in subsystem H A .
Fig. 3(a) shows results for the two qubit system.The non-locality reaches its maximal value twice during the time interval t ∈ 0, π J at which point the time evolution operator oscillates between the swap and the identity operator.In between, U generates entanglement on the trial state which is shown in green.We use QTPD to classically simulate the open quantum dynamics following Eq.( 20).Starting with the initial state |1⟩ ⊗ |0⟩, we simulate the time evolution of the density matrix describing qubit 1 and measure magnetization M A and entanglement entropy S from Eq. ( 22) and (23).The excitation transfer between the two qubits is reflected in the oscillation of the magnetization between the extremal values 0 and 1.At these points, the entanglement entropy reaches zero indicating an oscillatory swap between |10⟩ and |01⟩.The data from QTPD is in exact agreement with the analytical expressions derived in App.H.
On a 3 × 2 qubit grid on which excitations can swap between neighboring qubits, the overall dynamics is more complicated.While the non-locality of the time evolution operator quickly rises close to the maximum value, it no longer returns to zero within the time interval t ∈ [0, 4π J ].The trial initial state, |110000⟩, that is evolved in time, does not return to a product state in the considered time interval, which shows that part of the non-zero nonlocality is in fact entanglement generated by U .We discuss two measures of entangling power (see App. E) on the example of the Heisenberg model in App.H in order to specify the relation between non-locality and the entangling properties of U .

V. CONCLUSION
In typical problems of quantum information processing, we are given a quantum circuit unitary U as a quantum resource.A quantum tensor product decomposition enables the separation of a small subsystem H A from its environment and captures the effective action of U on H A .This allows one to classically predict entanglement features of U and post-process mereology and short depth compilation algorithms.Furthermore, it enables the study of open quantum dynamics interpreting the two subsystems, H A and H B , as system and environment, respectively.A generalization to decompose arbitrary tensors does not seem straight-forward (see Appendix G).Using block-encoding techniques, QTPD could be applied to arbitrary tensors.A hurdle that arises in that case is taming the sample complexity when sampling from the ancillary qubits needed for the block-encoding of matrix product operators [41].We remain curious about extending QTPD to include quantum states (possibly via block-encoding) where the sum of the QTPD coefficients, k s k , may be used as a criterion to detect bipartite entanglement (see "realignment criterion" in e.g.[1,42,43]) or verify matrix product operator structures of density operators [44].
QTPD paves the way for entanglement investigations at the operator level.A natural next step is to combine QTPD with an iterative search for quasi-classicality emerging in quantum systems.The non-locality that upper bounds entangling power can be used as a cost function to minimize the growth of entanglement with the environment.Also in reverse, QTPD can be used to verify decoherence free structures [31].
Depending on available resources in memory and connectivity, either the preparation of Bell pairs in depth 2, or the sequential approach can be used in a near-term application including QTPD.For medium-term quantum computing, a doubling in qubit number is within the scaling plans of modern quantum computing architectures, which typically strive for exponential growth.We can thus foresee an application in the near future that, for instance, utilizes 120 qubits (or 60 qubits in sequence) to solve the open dynamics of a 15+45 qubit system, a simulation problem that is no longer accessible with classical methods.One such application is the integration of QTPD into dynamical mean field methods, for instance for the simulation of impurity models.We leave these directions for future research.
Alternative process learning methods [39,40,45] make use of quantum machine learning-assisted postprocessing.While a comparison of the efficiency is not immediate, QTPD provides a stable solution using tomography instead of quantum optimization.
Similar to process learning approaches, QTPD can be generalized to capture the action of an arbitrary quantum channel E using its Choi state.Quantum circuits on near-term hardware will inevitably suffer from noise and therefore a QTPD on a noisy circuit is a natural next step.If hardware allows for the intended simulation of a specified quantum channel E, QTPD along with its applications can be straightforwardly generalized substituting the unitary U by E.

Classical snapshot state
The first step of QTPD involves a tomography of the density matrix ρ A (U ) of the Choi-state of U reduced to subsystem A, which we calculate in the following.First, define the full Choi state After tracing out H B , we get Executing the trace and using the orthonormality of the B k yields Finally, we identify vec which yields the result.Since the Hilbert-Schmidt product and the Euclidean inner product of vectorized operators are the same, we can read off the eigenvectors and eigenvalues of ρ A (U )

B-Distillation
To distill the action of U on the larger subsystem H B , we need to measure the projector The projective property of the P k directly follows from P † k = P k and P 2 k = P k .The output state of of the distillation circuit can be straight-forwardly derived in graphical notation where we used the orthonormality of the A k .The factor s m arises as we are not dealing with a unital channel.
Measuring P m involves post-selection on one of the generalized Bell states |Φ + A ⟩. We can also use an algebraic formula to prove the above statement.The partial measurement of the A-subsystem yields the (unnormalized) state where we used ∥A k ∥ 2 = d A .We arrive at the same result as in Eq. (B5) modulo tracing out H A .In practice, we want to construct a single measurement that outputs the states of Eq. (B6) for all m ∈ {1, ..., R}.This measurement can be described by the quantum channel E [ρ] = R m P m ρP m + QρQ, where Q = 1 − m P m is necessary to make E trace preserving, however the action of Q on the input state of Eq. (B5), ρ = U |Ψ + A ⟩⟨Φ + A | ⊗ |ψ⟩⟨ψ| U † , vanishes.The output state is thus the following mixed state describing a statistical mixture of the states of Eq. (B6) with probabilities Appendix C: Operator approximations using the singular value decomposition To show that a collection of the A k and B k also yield an optimal low rank approximation, we need to show that they form a global minimum of Eq. ( 5).The approximation of U by a rank k tensor decomposition is equivalent to finding a rank k approximation of R(U ) from van Loan and Pitsianis' algorithm, i.e.
We will show that t k = s k , C k = A k and D k = B k ∀k minimizes Eq. (C1).This statement has been proven in [46] using a distance induced by the spectral norm ∥.∥ ∞ .The generalization to the Frobenius distance is well known, but proofs are often omitted in the literature.We present one here.Proof.The second equality directly follows from the definition of the Frobenius norm.To show the first equality, consider an arbitrary rank r operator, S, and calculate the Frobenius distance where we denoted the i th singular value of matrix A by σ i (A) and dropped the r smallest singular values to estimate a lower bound.Let T r+i−2 be the rank r + i − 2 approximation as defined above.We have Here, we denoted the rank k approximation of T by T k .The inequality follows from the fact that the rank of (T − S) i−1 + S r is smaller or equal to the rank of T r+i−1 .Next, consider the inequality which is a direct consequence of the triangular inequality of the spectral norm.With this, we have Since S is rank r, we know that σ r+1 (S) = 0 and we are left with the overall inequality Returning to Eq. (C3), we can estimate Taking the square root of both sides shows the statement.

Nearest Unitary Approximation
In the extreme case where U is close to a rank 1 operator, i.e. s 1 ≈ 1 and s k ≈ 0 for k ̸ = 1, we expect the dominant product operator A 1 ⊗ B 1 to be close to a unitary.This means that A 1 and B 1 have to be close to unitaries U A and U B , individually.In the following, we find the closest unitaries U A , U B to A 1 and B 1 .This is a well-known problem that is solved by setting all singular values to 1.The following proposition holds for arbitrary unitary equivalent norms [47].For pedagogical reasons, let us review the proof for the 2-norm.
Proposition 2. Let T ∈ L(H) with a singular value decomposition T = U ΣV † , then Proof.From unitary invariance of the 2-norm, we can reformulate the minimization problem as follows min where we redefined the unitary W in the last step using the closedness of the unitary group.The 2-norm can be calculated explicitly as where we denoted the singular values by Σ kl = σ k δ kl .From unitarity of W , we know that ℜ [W kk ] ≤ 1 and thus Therefore, the closest unitary to T is U V † .
The nearest unitary approximation can be used to find a fast quantum transform U A (1) ⊗ U A (2) ⊗ ... ⊗ U A (M ) that approximates the action of U .For a Fast Quantum Transform, we need to iterate the unitary approximation for M many tensor product factors.Doing so, we introduce two types of errors, one by a rank one approximation (cf.Proposition 1) and one by the nearest unitary approximation of the dominant components A 1 (cf.Proposition 2).
Proof.The backwards direction "⇐" becomes trivial as soon as we write down U in diagonal form.Let T denote the eigenbasis of U , i.e.For the forward direction "⇒" consider again the eigenbasis V of U .As we know that the eigenvalues are related by θ i = ϕ µ + ψ m , we can put V into an order such that D U = D A ⊗ D B decouples with . . .
phases between s k and B k , which is technically against our convention introduced in App.A assuming the s k to be real.With Eq. (H2), we can read off the Schmidt coefficients g i and deduce that U has maximal Schmidt rank 4 except for when one of the terms vanishes.In the isotropic case, this happens for t = π 2J .If we require J z = 0 and fix J x = J y = J, we recover the swap gate S at t = π 4J , i.e.
Hence, we get the tensor decomposition of the swap gate for free from Eq. (H2).We can use this to write down the entangling power of U on qubit 1 One can see directly that for U = S, the entangling power vanishes while the non-locality measure S 1 (U ) is maximal.
In order to get large entangling power, both S 1 (U ) and S 1 (U S) have to be large.The mean entanglement generation, that is derived in App.E, behaves similarly in the 2-qubit case.The last term in Eq. (E7) is the only non-trivial term to evaluate.The only non-vanishing trace terms yield k,l,m,n  H1)) can be expressed as a quantum channel E t that evolves a density matrix describing the quantum state of qubit 1 = cos 2 ((J x + J y )t)|1⟩⟨1| + sin 2 ((J x + J y )t)|0⟩⟨0|.(H8) We can read off the Schmidt values of this state |λ 0 | 2 = sin 2 ((J x + J y )t) and |λ 1 | 2 = cos 2 ((J x + J y )t).In order to learn properties of the output state, we measure the magnetization ⟨Z⟩ ρ1(t) and the entanglement entropy S(ρ 1 (t)) of the time-evolved state in the numerical experiment.In the two-qubit case, we can write down the two observables as functions of time ⟨Z⟩ ρ1(t) = sin 2 ((J x + J y )t) − cos 2 ((J x + J y )t) (H9) S(ρ 1 (t)) = sin 2 ((J x + J y )t) log sin 2 ((J x + J y )t) + cos 2 ((J x + J y )t) log cos 2 ((J x + J y )t) . (H10) The above example starts with a product state |10⟩ and does not show any quantum coherence after time evolution.as opposed to Eq. (H8).Fig. 4 repeats the numerical experiment shown in the main text, but also includes the entangling power measures e A (U ) and e m (U ) introduced in App.E. We compare the numerical results with the analytical derivation above and find an exact match.As we discuss parts of Fig. 4 already in the main text.we focus on features of the entangling powers here.
In the two-qubit case (cf.Fig. 4(a)), the non-locality undergoes four oscillation periods, representing the oscillation of U between identity and swap operator.In accordance to this, the entangling powers both show an oscillation and vanish at the extreme points of the non-locality function S A (U ).As they reach zero when U equals the swap operator, while the non-locality stays maximal, they thus both undergo twice as many oscillations.
On a 3 × 2 qubit grid (cf.Fig. 4(b)), the entangling power measures no longer follow defined oscillations, but start off at zero, quickly rise and then stay at a non-zero value most of the time.e A (U ), since it singles out swap operations, does become close to zero for certain simulation times.However, the trial state shows non-zero entanglement entropy at those times.As a consequence, e A (U ) does not seem to be a good measure of entangling power when the dimensions d A and d B of subsystems H A and H B no longer match.The mean entanglement generation e m (U ), on the other hand, is not in disagreement with this.Similar to the non-locality, it rises quickly, but stays around approximately 65% of its maximal value admitting dips at the same points as the non-locality S A (U ), thus allowing for non-zero entanglement throughout the considered time interval.

Figure 2 .
Figure 2. Low-entanglement clustering for a matrix product state representation.(a) Dividing into subsystems A and B, a matrix product operator representing U requires a certain bond dimension D ≤ d 2 A , which is upper bounded by the lower dimension d 2 A ≤ d 2 B and dependent on the entanglement between A and B. (b) Multi-partite decomposition of U into A (m)

Figure 3 .
Figure 3. Open quantum dynamics for the isotropic Heisenberg model.We study a subsystem of one of two neighboring qubits (a) and two qubits of a 2D grid of 3×2 qubits (b).The blue points show the non-locality measure SA(U ) for the respective time evolution operator, while yellow and green contain the total magnetization MA(σA) and the entanglement entropy S(σA) of the time evolved state.The initial state of the total system reads |1⟩ ⊗ |0⟩ in (a) and |11⟩ ⊗ |0000⟩ in (b).In (a), solid curves represent analytical predictions derived in App.H.

1. Optimal Low Rank Approximation Proposition 1 .
Let T ∈ L(H) with a singular value decomposition T = R i=1 σ i u i v † i .Let T r be the truncation of T to its r largest singular values, i.e.T r = r i=1 σ i u i v † i .Then min rank(S)=r ∥T − S∥ = ∥T − T r ∥ =

with d = 2 .
The three terms arise from different combinations of inserting the Pauli operators A k , B k ∈ {1, X, Y, Z}.Since Pauli operators are traceless, the products A k A † l A m A † n have to result into 1, such that the trace yields a factor of dimension d.The first term comes from traces of the form Tr(A 4 k ) 2 , the second from Tr(A 2 k A 2 l ) 2 , as well as Tr(A k A l A k A l ) 2 with k ̸ = l and adequate combinatorial coefficients.Finally the third term represents traces in which all operators are different, i.e.Tr(Ak A l A m A n ) Tr(A k A n A m A l ) = − Tr(A k A l A m A n ) 2 with (k, l, m, n) = (σ(0), σ(x), σ(y), σ(z)), σ ∈ S 4 .Now let us fix the initial state to be |10⟩.The effective open time evolution of the first qubit under the Hamiltonian H (Eq. (

Figure 4 .
Figure 4. Open quantum dynamics for the isotropic Heisenberg model of a subsystem of one of two neighboring qubits (left) and two qubits of a 2D grid of 3 × 2 qubits (right).The upper panel shows non-locality measure SA(U ) and entangling power measures eA(U ) and em(U ) (see App. E) for the respective time evolution operator, while the lower panel contains the total magnetization MA(σA) and the entanglement entropy S(σA) of the time evolved state.The initial state of the total system reads |1⟩ ⊗ |0⟩ in (a) and |11⟩ ⊗ |0000⟩ in (b).In (a), solid curves represent analytical predictions derived in Eqs.(H4 -H10).