Learning a quantum channel from its steady-state

We present a scalable method for learning local quantum channels using local expectation values measured on a single state—their steady state. Our method is inspired by the algorithms for learning local Hamiltonians from their ground states. For it to succeed, the steady state must be non-trivial, and therefore the channel needs to be non-unital. Such non-unital channels are readily implementable on present day quantum computers using mid-circuit measurements or RESET gates. We demonstrate that the full structure of such channels is encoded in their steady states, and can be learned efficiently using only the expectation values of local observables on these states. We emphasize two immediate applications to illustrate our approach: (i) Using engineered dissipative dynamics, we offer a straightforward way to assess the accuracy of a given noise model in a regime where all qubits are actively utilized for a significant duration. (ii) Given a parameterized noise model for the entire system, our method can learn its underlying parameters. We demonstrate both applications using numerical simulations and experimental trials conducted on an IBMQ machine.

Among these methods, a promising line of research tries to learn the underlying generators of dynamics using local measurements on a steady state of the system [21][22][23][24][25][26][27].It can be shown that if the underlying Hamiltonians or Lindbladians are sufficiently generic, they can be uniquely determined by the expectation values of local observables in these steady states.Specifically, Ref. [25] showed that for a system governed by a local Hamiltonian, there exists a set of linear constraints between expectation values of local observables in a steady state of the system, from which an unknown local Hamiltonian can be uniquely determined.Following that work, Ref. [26] showed that a similar set of constraints can be obtained for the case of open quantum systems, allowing reconstruction of unknown local Lindbladians by performing local measurements on a steady state.Together, Refs.[25,26] imply that by measuring local expectation values from a single state within a given patch of the system, we can infer the underlying Hamiltonian or Lindbladian terms in that region [28].
In some respect, at least in the case of local Hamiltonians, the method relies on the non-commutativity of the local Hamiltonian terms, and is therefore a 'purely quantum' method, with no classical equivalent.It gives a * yigal.ilin@gmail.com;yigali@campus.technion.ac.il surprisingly simple and efficient method for learning the generators of the local dynamics, and can also be used to test how well a particular model of the Hamiltonian or Lindbladian agrees with the empirical measurements.All this is done without having to classically calculate local expectation values, which is crucial for the scalablity of the approach.
The simplicity and efficiency of the methods developed in Refs.[25,26] raises the question of whether and how they can be applied to discrete quantum channels, particularly those describing (or engineered on) quantum computers.Specifically, we consider an engineered dissipative quantum channel, realized on a quantum computer.It consists of a series of known quantum gates that we apply, together with an unavoidable (and unknown) noise channel.Can we learn the unknown noise parameters solely by measuring local expectation values on its steady state?Can we do all that without classically simulating the system?
To generalize the aforementioned techniques to the domain of quantum computers, several points have to be addressed.At the abstract level, the evolution of local Hamiltonians and Lindbladians is continuous, while quantum computers evolve discretely through gate-based operations.Furthermore, the introduction of noise disrupts locality, as the noise channel can act simultaniously on all qubits.
In this work we propose a method that generalizes the approach of Refs.[25,26] to discrete channels that describe gate-based quantum computers.Our main idea is to engineer a dissipative channel that can be implemented on a quantum computer using non-unitary operations, such as the RESET gates, available on current day machines.These non-unitary operations lead to a dynamic that is described by a non-unital channel, i.e., channels E(•) for which E(I) ̸ = I.By definition, the steady state of these channels is different from the maximally mixed state, and may therefore contain fingerprints of the underlying dynamic.By iteratively applying this channel, we converge to a steady state, on which we measure local expectation values.These expectation values satisfy an extensive set of local linear constraints among themselves, which can be efficiently verified.Efficiency in this context denotes that our method does not require classical estimation of these expectation values.Furthermore, all classical post-and pre-processing procedures exhibit polynomial complexity with respect to the system size.Notably, the number of local linear constraints scales linearly with the system size.Therefore, these constraints can be used to efficiently learn the underlying channel by measuring only a single state -the steady state of the channel.
Like the methods in Refs.[24][25][26][27], our method is highly scalable, and can be used on systems with a large number of qubits.To demonstrate it, we introduce two generic ways to construct a dissipative quantum channel on a quantum computer.The first way constructs a stochastic evolution that can be described as a local Kraus map (at least in the absence of noise).It can be implemented on a quantum computer by randomly applying a gate from a set of gates with a prescribed set of probabilities.Consequently, the steady state is probed by running many different random circuits, which realize different trajectories of the dynamics.The second way applies a deterministic circuit that is based on a composed 2-local gate, which is both non-unitary and entangling.Implementing this map on a quantum computer requires only one circuit, which makes it easier to execute on currently available hardware such as IBMQ.To test our method we used numerical simulations of both methods on 5-11 qubits, and in addition applied the deterministic method to an IBMQ machine using 5 qubits.
The structure of this paper is as follows.In Sec.II we provide brief definitions and basic facts on quantum channels and gates, and the notation we use throughout the paper.Then in Sec.III we introduce the theoretical framework of learning a local quantum channel from its steady state, and explain how such framework can be used, for example, to learn and validate the noisy dynamics of a quantum computer.In Sec.IV we focus on the learning and validation task by defining the noise model we will be using to characterize and validate (benchmark) the noisy dynamics of IBMQ machines.The optimization method used in the variational learning part is also described.In Sec.V we describe the results of our numerical simulations, performed on the two types of engineered dissipative dynamics.In Sec.VI we present the results of applying our method on an actual IBMQ machine using 5 qubits, which include benchmarking different noise models, as well as using our method to learn its noise.Finally, in Sec.VII we present our conclusions and discuss possible future research directions.

II. PRELIMINARIES
Throughout this work we consider a quantum computer (QC) made of n qubits with Hilbert space (C 2 ) ⊗n .Quantum states will be generally denoted by Greek letters ρ, σ, etc.The expectation value of an observable A with respect to the quantum state ρ will be generally denoted by ⟨A⟩ ρ = Tr(ρA).
Unitary gates will be denoted by X, Y, Z, CX, R x , etc.We use CX(i → j) to denote the CNOT (Controlled-NOT) gate between the control qubit i and the target qubit j.We use R x (θ) to denote a Bloch sphere rotation of angle θ around the axis x.Quantum channels, i.e., completely positive, trace-preserving (CPTP) maps will usually be denoted by calligraphic letters such as E(•), N (•), etc.Given a unitary gate U , we will denote its corresponding channel by U, i.e., U(ρ) def = U ρU † .For any quantum channel E, its adjoint is the unique superoperator E * that satisfies (E * (A), B) = (A, E(B)) for any operators A, B, where (A, B) In other words, the maximally mixed state is a fixed point of the channel.
An important example of a non-unital channel that can be applied on a QC is formed by the RESET gate, which performs an active mid-circuit reset of a qubit to the |0⟩ state.Ideally, it corresponds to the channel It can also be realized by measuring the qubit in the standard basis, and applying X gate whenever the result is |1⟩.

III. LEARNING QUANTUM CHANNELS FROM THEIR STEADY STATE
In this section we introduce our method for learning quantum channels using local measurements on their steady state.We shall consider channels that can be realized on a quantum computer.A summary of the main steps of our method is given in Fig. 1.Two examples to keep in mind are (i) the application of quantum gate drawn from a set of gates according to a fixed probability distribution, and (ii) a low-depth quantum circuit.
In both examples, we assume that the overall channel is non-unital because of the noise and/or because of actively using the RESET gate.Unlike unital channels, whose fixed point is typically the maximally mixed state, the fixed points of non-unital channels are typically nontrivial in the sense that two different channels from a relevant set of channels will have different fixed points.We shall further assume that E(•) has a unique fixed point E(ρ ∞ ) = ρ ∞ , which we call the steady state.We shall also assume that ρ ∞ can be approximately reached by the QC using few applications of E, starting from an initial state ρ 0 = |0⟩⟨0| ⊗n .As we shall see, our protocol uses expectation values of local Pauli operators in the steady state.Therefore, a practical way to verify that we are close enough to this state would be to demand that these local expectation values converge, i.e., that the difference between the expectation values we get in T steps and in T + 1 steps is of the order of the statistical error.For the channels that we consider in this work, convergence was always attained after few tens of steps.For brevity, we will denote expectation values with respect to ρ ∞ by Given a channel E, assume that the system is in its steady state ρ ∞ .Then the expectation value of any observable A must satisfy The above equation is the basis of our method.For the channels that will be discussed below, when A is a local observable, the observable E * (A) is also local, or at least can be estimated using local measurements.In such case, we can use the equality between the LHS and RHS to constrain, or even learn E, without having to calculate the expectation values in ρ ∞ -which might be hard classically.Moreover, as we are measuring the steady state of the system, our results are independent from the initial state of the system, and therefore insensitive to state preparation errors.
Below, we describe two methods to construct such E, which we call the stochastic method and the deterministic method.In Sec.III A and Sec.III B we describe the first method, and in Sec.III C describe the second.

A. A stochastic map: the strictly local case
Let {E k } be a set of local channels implementable by a QC.We can think of every E k as being the application of a quantum gate or a small circuit acting on a few contiguous qubits.We assume that the E k channels are strictly local in the sense that they act trivially outside the support of the small circuit that defines them.This assumption breaks down in realistic quantum hardware, where additional noise channels act simulatiniously on all qubits.In the next subsection we will show how to extend our method to many of these cases (where the noise is local and Markovian), but for the sake of keeping the presentation clear, we start by assuming that the E k channels are strictly local, acting on at most two neighboring qubits.In addition, we allow some of the E k to be non-unital by containing RESET gates.
Together with {E k }, we will use a probability distribution {p k } and define our quantum channel by Physically, E can be implemented on a QC in a stochastic way: we choose a k with probability p k and apply E k .For example, the three qubits non-unital quantum channel can be implemented on a QC by randomly applying one of the gates {X according to the probabilities {0.2, 0.2, 0.4, 0.2}.A simple numerical check shows that the steady state of the this channel is a non-product state which can be approximated to within a trace distance of 10 −5 using few tens of steps.
Plugging Eq. ( 2) in Eq. ( 1), we see that for any local observable A, Importantly, if E k is a 2-local channel, then E * k also acts non-trivially only on the two-qubits on which E k is defined.Consequently, if A is a t-local observable, then E * k (A) is at most a (t + 2)-local observable.In fact, it is at most (t + 1)-local for the following reason.The only case where E * k (A) might be (t + 2)-local is when the supports of E k and A are disjoint.But in such case, Using this observation, we may write Eq. ( 3) as where we use the notation k ∈ supp(A) to denote enumeration over all k indices for which E k acts non trivially on A. Writing the LHS as p k ⟨A⟩ ∞ and re-organizing the equation, we obtain the following equation, which holds for every local observable A: The above equation can be used to validate a channel E. Given a channel E, we shall now let S denote the set of local observables.Throughout this paper, we shall always take S to be the set of geometrically t-local Pauli operators for some fixed t -but at large, our method is independent of that choice.Using S, we define a cost function Φ as a normalized sum of differences between the LHS and RHS of Eq. ( 4) for all local observables A ∈ S: Assuming the observables in S are orthonormal with respect to the Hilbert-Schmidt norm, the expression Here C ij is a matrix that fully describes the channel.With this matrix, the cost function is given by When E is given by Eq. ( 2) with 2-local channels E k , and S is a set of t-local Pauli operators, then E * (P i ) is at most a (t + 1)-local operator, represented in the (t + 1)local Pauli basis.Therefore, by measuring the (t + 1)local Pauli expectation values in the steady state, we can calculate Φ and see how well the model fits the actual quantum hardware.
This procedure can also be made local by considering local cost functions Φ q for qubit q.Denoting by S q the set of observables A for which E * (A) acts non-trivially on q, the local cost function is given by where the Sq .The local validation scheme allows us to identify the regions in the system where the model performs well or badly in a setup in which possibly all the device qubits are being actively used.Because it only requires (t + 1)local expectation values, it is highly scalable and has a small computational cost.
The above idea can be pushed further; we can actually use Eq. ( 4) to learn the channel E. Suppose we have some parameterization of E, denoted by E θ , where θ is a vector of parameters.We may define the θ-dependent cost function and find the particular θ that best describes the system by minimizing Φ(θ).When S consists of local Pauli operators {P i }, we can repeat the derivation that led to Eq. ( 9) and write the cost function as Obtaining expectation values ⟨P j ⟩ ∞ in the steady state allows us to learn the best θ that describes the quantum hardware.Note that once expectation values are obtained, the minimization of Φ(θ) can be done without the 1/C S normalization, as it is θ-independent.In such case, the cost function is a simple quadratic expression in ⟨P j ⟩ ∞ with θ-dependent coefficients.
The θ parameterization can be full, in which case we would need about 16 × 16 parameters to describe all possible 2-qubit channels E k , or it could contain a much smaller number of parameters if we use prior assumptions on the structure of E k .For example, we may model the channel that corresponds to an R x (α) rotation using one parameter -α, in order to see if the hardware gate under-rotates or over-rotates with respect to the x axis.
For a system with n qubits, there are exactly N t = t k=1 3 k (n − k + 1) t-local Pauli operators with contiguous support, not counting the trivial identity operator.By selecting t-local Pauli operators as the observables A in Eq. ( 11), we effectively minimize over N t constraints, and therefore it is desirable that the number of free parameters will be smaller than N t [26].Although this requirement may seem restrictive, it still permits the consideration of highly detailed models.For instance, with a system size of n = 6 and observables' locality t = 2, the number of constraints amounts to N t = 63, enabling the incorporation of models with up to 10 parameters per qubit.If a greater number of parameters per qubit is required, one can simply increase the locality t to generate additional constraints for the same system size.

B. A stochastic map: the non strictly-local case
In the previous section we derived our main equation, Eq. ( 4), under the assumption that E k are strictly local channels, acting non trivially on at most O(1) qubits.This assumption is no longer true in realistic noise models, in which also idle qubits, far away from the support of E k , experience noise such as dephasing and amplitude damping.In this section we show how our method can be generalized to account also for (some of) these cases.Specifically, we shall assume a local Markovian noise model without cross talks, with the following properties: 1.Each qubit j has its own idle noise channel N j , which acts independently of what happens to the other qubits.Thus the noise on the set of idle qubits I idle factors into a tensor product of singlequbit noise channels j∈I idle N j .
2. When a unitary gate or a RESET gate acts on one or two qubits, the actual channel applied by the QC is a noisy version of these gates that acts only on the qubits in the support of the ideal gates (i.e., no spillover or cross-talks to other qubits), together with the idle noise channel on the rest of the qubits.
Together, these two assumptions imply that the noisy channel can be written by where Ẽk is the noisy version of E k and supp(k) denotes the qubits in the support of E k (which, under our assumptions, also define the support of Ẽk ).
To proceed, we define superoperators We will use F k as a mathematical tool to simplify the equations we derive below.Note that F k is not necessarily a channel, since N −1 j by itself is not necessarily a channel.Nevertheless, it is a local superoperator which satisfies F k (A) = A when A is outside its support.The point of using F k is that it allows us to write the global action of the kth local channel without the j / ∈ supp(k) condition: Therefore, By the same argument used in the previous section, we note that the local noise channels act on A non-trivially only if they are in its support, and therefore we can define a "noisy version" of A by Plugging this back into Eq.( 15), the formula for E * (A) becomes Following the treatment in Eq. ( 4), we can further simplify the formula, by noting that F * k ( Ã) = Ã whenever F k acts outside the support of A, and therefore Note that in the above equation only k for which F k intersects the support of A are summed over.Consequently, Eq. ( 1) becomes As in the previous section, we can use the above set of constraints as a validation tool for particular noise models by defining local cost functions, as done in Eq. ( 10).
We can also use it to learn the best noise model from a family of noise models parameterized by θ, by using the global cost function where Ãθ is defined in Eq. ( 16).As before, to evaluate Φ(θ) from local measurements, we expand the operators Ãθ , F * k,θ ( Ãθ ) in terms of (t + 1)-local Pauli strings with θ-dependent coefficients and write the cost function Φ(θ) as a quadratic expression of these expectations (see Eq. ( 12)).

C. Deterministic map
Realizing the method presented in Sec.III B on a QC requires the execution of a large number of different quantum circuits, or trajectories, in order to properly sample the steady state.This makes it challenging to implement on currently available devices, which are often limited by the number of distinct circuits that can be executed in a single experimental batch.
To overcome this limitation, we propose an alternative way to construct a non-unital channel, which relies on a deterministic map that can be implemented on a QC using a single circuit.For simplicity, we shall describe our construction in the 1D case with open boundary conditions, but generalization to other geometries and higher dimensions is straightforward.In such case, our map E is defined by a product of local channels {E k }, where E k acts non-trivially on the neighboring qubits k, k + 1.The E k channels are organized in a two layers brick-wall structure, as shown in Fig. 2.
The key motivations behind this construction arise from scenarios where numerous qubits are simultaneously active, providing a natural way to test the presence of cross-talks.Additionally, reaching an approximate steady state with minimal channel applications is crucial to ensure that this circuit can be executed on the quantum computer within a specified time window.
As explained in Sec.III, to learn the parameters of the quantum channel, we want its steady state to be nontrivial in the sense that it will not be the steady state Example of the deterministic channel implementation on a QC.Left: Odd channel EI applied on four qubits, the even layer Eeven is followed by the odd layer E odd .During the action of the E2 on qubits 2, 3 in the even layer, we assume that the idle noise channels N1, N4 are acting on the qubits 1, 4. Right: Inner structure of the 2-local channel E3.The 2-local non-unital channel E3 consists of applying a two-qubit unitary gate U3,1 on qubits 3, 4, followed by a RESET gate on qubit 3 along with idle noise channel N4 on qubit 4, and ends with applying a two-qubit unitary gate U3,2.The twoqubit unitary gate U3,1 has two known single-qubit rotation gates Rα, R β about axes α and β together with a CX gate.
The exact details of our realization of the noisy non-unital channels of other local channels.To that aim, we want the local E k channels to be non-unital and also entangling so that the steady state will be a non trivially-entangled state.
For example, such E k may be realized on a QC by some combination of CX, single-qubit rotations, and RESET gates, as shown in Fig. 2 and discussed below.
To derive our constraints, we view E as the product of two layers.We define the odd layer by E odd def

=
odd k E k and even layer by E even def

=
even k E k .There are two possible choices for the overall channel E, based on the order in which the layers are applied: and E II def = E even • E odd .An example of the E I channel is given in Fig. 2. The steady state of E I is defined by and the steady state of In general, they will differ from each other.For brevity, expectation values calculated under Let us now apply Eq. ( 1) to the case of E I and a local observable A: To simplify the equation, we restrict our attention to observables A j acting on qubits j, j + 1 for odd j.The support A j coincides with the support of E j from E odd .By the same argument that was used in Sec.III A, if the supports of E k and A j are disjoint, E * k (A j ) = A j , and so E * odd (A j ) = E * j (A j ).Another simplification comes from the fact that the light cone of A j has support only on E * j−1 and E * j+1 , as shown in Fig. 3. Therefore, All together, this allows us to rewrite Eq. ( 21) in terms of 4-local expectation values on the RHS: For observables A j with even j, a similar equation holds with the steady state of E II replacing the steady state of E I .Following our previous construction with the stochastic map, we can use the constraints in Eq. ( 22) to validate a given model for the local channels, or learn the model that best fits the data from a family of models characterized by a set of parameters θ.For the learning task, we start by defining the cost function Φ I (θ) using expectation values in the steady state ρ I ∞ : Above, the sum is over a set S I of 2-local observables A j defined on j, j + 1 for odd j, which we take to be Pauli operators.Similarly, we define Φ II (θ) from the expectation values at ρ II ∞ , and set For a system with n qubits, there are exactly 3n + 9(n − 1) = 12n − 9 such geometrically 2-local Pauli operators, excluding the trivial identity operator.This number provides a rough upper bound to the number of parameters that can be learned using this method, as discussed earlier at the end of Sec.III A.
To validate a given model, we define local cost functions Φ q , which for a given q contain all terms in Φ I , Φ II that overlap q.See Eq. ( 10) for the equivalent definition in the stochastic case.
As in the case of the stochastic map, we can expand in terms of 4-local Pauli operators, and write Φ(θ) as a quadratic expression of the expectation values of these operators, as done in Eq. ( 12).
There are many ways to realize the non-unital channels E k on a QC.In this work we use a composed gate, which we call the RESU gate (RESU = RESET + Unitary).It is defined by two 2-local unitaries U k,1 and U k,2 acting on qubits k, k +1 and a RESET gate on qubit k.Each of the 2-local unitaries is made of a CX gate and two general 1-qubit rotations, as shown in Fig. 2. Further details of how we model this gate in the presence of noise are given in Sec.V C 1.
In Sec.V C we show numerically that our protocol can be used to learn a given noise model, and in Sec.VI we demonstrate its performance on actual quantum hardware.

IV. NOISE MODELS AND CLASSICAL OPTIMIZATION
In this section we describe the noise models and the numerical procedures we used to find θ by optimizing over the cost functions in Eqs.(20,24),

A. Local Markovian noise models
We assume a local Markovian noise model, which can be generally treated using the Lindbladian master equation [30][31][32].To model the evolution of a noisy quantum gate, we write Above, T 0 is a time scale, to be determined later, and the dimensionless Hamiltonian H k (θ k ) defines the coherent evolution of the gate U k , i.e., U k = e −iH k (θ k )T /T0 , where T is the gate time and {θ k } are variational gate parameters.For example, the {θ k } parameters may describe a source of coherent error in the two-qubit CX gate.Under our local noise assumption, we take L m to be single-qubit superoperators that represent different single-qubit noise processes.They are given in terms of jump operators L m , L m (ρ) , where {•, •} is an anti-commutator.The variational parameters {θ m } model the strength of the different noise processes.For example, for a single-qubit dephasing noise, we use L z (ρ) = ZρZ − ρ, and the corresponding θ/T 0 is the decoherence rate.
Given a set of channels {E k } that define either the stochastic or the deterministic maps, we take T 0 to be the maximum over the running times of {E k }.Working with the IBMQ hardware, this time scale was primarily due to the RESET gate time, which is typically on the order of a microsecond [33,34].The T 0 and gate execution times we used in our numerical simulations are given in Appendix B and Appendix C.
We allowed the coupling strength (variational) parameters to be different for each qubit.For a qubit j, we use the notation θ (j) = {θ (j) m } to denote its relevant noise parameters.
Using the above assumptions, the noise channel N θ (j) on an idle qubit j for time T is represented by: Similarly, the noisy version of a unitary gate U k described by a Hamiltonian H k (θ) acting for time T is given by: If U k is a two-qubits gate, the above equation will contain the θ m L m operators of both qubits.Equation ( 27) in its entirety was in fact only used to model the noisy CX gate when analyzing the experimental data from the IBMQ hardware.For the single qubit gates, or for the CX gate in the numerical simulations, no coherent errors were assumed, and H k (θ k ) was taken to be the ideal Hamiltonian, which is independent of θ k .The full details of the CX modeling we used are given in Appendix A.
Another simplification was the use of a first-order Trotter-Suzuki approximation for the case of single-qubit gates.As the typical execution time of these gates is much shorter than that of the CX gate (in the IBMQ hardware their execution time is the range of tens of nanoseconds [33,34], which is about an order of magnitude shorter than the execution time of CX), we approximated Eq. ( 27) by Above, •] is the ideal single-qubit gate channel.Notice that as the resultant error of this approximation scales as and as ∥H k ∥ and ∥θ m L m ∥ are all O(1) (see Appendix B and Appendix C for more details), we deduce that our Trotter-Suzuki error for single-qubit gates is of the order of T /T 0 2 ∼ 10 −4 , which is much smaller than all other error sources (e.g., statistical error) described below.
We conclude this section by describing how we modeled the noisy RESET gate.Here we followed Ref. [35] and modeled the noisy gate phenomenologically using a Kraus map representation.In this approach, the noisy RESET gate on the qubit j is described by two variational parameters, θ (j) = (θ is the probability to measure 0 given that the qubit j was in the state 0, and the θ (j) 1 is the probability to measure 1 given that the qubit was in the state 1.All together, the noisy RESET gate is modeled by

B. Classical optimization
To define the cost function Φ, we used a set S of all 2-local Pauli operators with contiguous support.The minimization of the cost function Φ(θ) in Eqs.(20,24) was done using the stochastic optimization algorithm Adam [36], implemented by PyTorch with an automatic differentiation engine [37].We used the same numerical optimization procedure for both the numerical simulations and the experimental results on real quantum hardware.This is because the cost function depends only on the local expectation values, which may come either from numerical simulations or actual experiments.As the cost function normalization constant 1/C S does not depend on the variational parameters, it can be ignored during the optimization process, yielding a cost function that is quadratic in the Pauli expectation values.
The computational cost associated with optimization via a gradient-descent algorithm depends upon specific algorithmic decisions and various hyperparameters, including the learning rate, number of steps, and different regularization techniques.However, as a general rule, the computational resources needed for a single optimization step scale equivalently to those required for evaluating the loss function Φ(θ).In our scenario, where Φ(θ) comprises O(n) local constraints, we anticipate the overall runtime to follow a scaling pattern akin to O(T n), where T denotes the number of optimization steps [36].
Heuristically, we fixed the maximum number of optimization steps to be 15, 000.We added another heuristic termination criterion for the optimization process, wherein the optimization ceased if at step i, the difference in the cost function from step i − 500 was less than 0.25%.
In what follows, we denote by θ est the channel parameters which achieve the lowest cost function value of Φ(θ est ).

V. NUMERICAL SIMULATIONS
To test our method, we performed several numerical simulations of the stochastic and deterministic maps, and optimized over the cost function to learn the underlying noise model.Below, we briefly describe the technical details of simulations, followed by their results.
A. Simulation of the dynamics, calculation of the expectation values and validation of the numerical results For both maps, we simulated systems of n = 5 to n = 11 qubits arranged on a line, with the initial state ρ 0 = |0⟩⟨0| ⊗n .The simulations were done by evolving the full density-matrix on a computer.
An approximation for steady state ρ ∞ of a quantum channel E was obtained numerically by iteratively applying the quantum channel until the convergence criteria D ρ t , E(ρ t ) def = 1 2 ∥ρ t − E(ρ t )∥ 1 < 10 −6 was reached.The number of channel applications required to for the convergence may be influenced by various factors such as system size, strength of noise processes, choice of the noise model, and the number of RESET gates.For both maps considered, the convergence criteria was met within at most few tens of channel applications for all system sizes.The approximate steady-state was then used to calculate the expectation values of the local Pauli operators in the different cost functions Φ(θ), as given in Eq. (12).
We modeled the statistical noise in the expectation values of the local observables using a Gaussian random variable that approximates the binomial distribution of measurement results.For an observable A that is a product of Pauli matrices, it is easy to verify that the statistical noise of N shots is well approximated by a Gaussian random variable G ∼ N (0, σ) with σ = (1 − ⟨A⟩ 2 ∞ )/N .Our numerical findings were validated by calculating the diamond distance between the channels E and Ê, which represent the original noisy gate and the noisy gate derived from the optimization process, respectively.The diamond distance is defined as: where I is the identity channel with the same dimensions as E and Ê, and the maximization is taken over all density matrices of dimension dim {E ⊗ I}.Specifically, we focused on evaluating the accuracy of our approach in the presence of statistical noise and across various system sizes.
As detailed in Appendix E, the validation process outlined above, along with the statistical noise model employed for numerical simulations, cannot be directly extrapolated to scenarios where expectation values used in the cost functions Φ(θ), were acquired from real experiments conducted on quantum computers.Nevertheless, i j |0⟩ N i FIG. 4. Structure of the RESCX(i → j) gate acting two neighboring qubits j, i.We first break the entanglement between i, j by resetting qubit j, and then recreate some entanglement between them by applying CX(i → j).During the action of the RESET gate on qubit j, the idle noise channel Ni is acting on qubit i.
the statistical noise model employed in our numerical simulations provided valuable insights that we utilized in the experimental demonstration.

B. Stochastic map simulations
For the stochastic map simulations, our noise model consisted of dephasing (in the ẑ direction) and amplitude damping, as well as the noisy RESET gate from Eq. ( 29).This amounts to 4 noise parameters for each qubit: 2 for the dephasing and amplitude damping and 2 for the noisy RESET gate.In all unitary gates, we assumed no coherent errors.The exact form of the Lindbladian dissipators is given in Appendix B 2.
We simulated 3 different stochastic maps, which used the same local channels but with different probabilities {p k }.We refer to them as probability set 1, 2, and 3.
The set of gates that defined the stochastic map consisted of the single-qubits gates X 1/2 , H, R α (ψ), and a non-unitary 2-local gate between neighboring qubits i, j, which we call RESCX(i → j).The R α (ψ) gate is a single-qubit rotation gate by an angle ψ around the axis α, and RESCX(i → j) is discussed below.
Following the IBMQ hardware specifications, the R α (ψ) gate was implemented as a combination of two X 1/2 and three R z (ϕ i ) gates, which are native gates in IBMQ: In the formula above, the three rotation angles {ϕ i } 3 i=1 implicitly define the rotation angle ψ and the rotation axis α.
The RESCX(i → j) gate is a combination of the RESET gate on j, followed by a CX(i → j), as shown in Fig. 4. The motivation behind this construction is that the RESET gate makes the quantum channel nonunital, but also breaks entanglement.This might lead to a trivial steady-state, from which it is hard to learn the underlying noise model.Applying the CX(i → j) gate right after the RESET(j) was applied regenerates some entanglement.The full description of the 3 probability sets, together with the underlying noise model, the gates and their execution times, are given in Appendix B.
Our three probability sets were chosen such that the probability of picking a certain gate is independent of the qubit on which it acts.Taking the time scale T 0 (see also Sec.IV A) to be the maximum running time over the gate set {X 1/2 , H, R α (ψ), RESCX}, the implementation of the stochastic channel from Sec. III B on a quantum computer, is described by the following algorithm: 1. Pick a gate 1 ≤ k ≤ 4 from the set {X 1/2 , H, R α (ψ), RESCX} with probability p k .
2. If the chosen gate is a single-qubit unitary, uniformly choose a qubit and apply it.
3. If the chosen gate is the RESCX gate, uniformly choose a neighboring pair of control and target qubits and apply it.To evaluate the accuracy of the numerical optimization, we calculated the diamond distance defined in Eq. (30), between the channel representations of the original noisy CX and RESET gates and the ones obtained from the optimization routine.We focused on these two gates because they have the longest gate times, and therefore they are the most noisy gates in the simulation.
Fig. 5 and Fig. 6 present the average diamond distance between the original noisy CX and RESET gates and their respective estimations.For a system of size n, the average is taken over n possible RESET gates and 2(n − 1) possible CX gates.Fig. 5 shows the average diamond distance as a function of the number of shots per local Pauli observable, with the system size fixed at n = 6 qubits.The results are robust for the three different probability sets.
With N = 10 6 shots per Pauli observable, the average diamond distance for the CX gate is approximately 2×10 −5 , while the average diamond distance for the RE-SET gates is of the order 2×10 −2 .As our method aims to learn the dynamics of the underlying noise model rather than targeting a specific gate, the gap between these two distances likely arises because the noise parameters governing the noisy CX gate appear not only in the RESCX gate, but also in the idle noise channels and the noisy single qubit gates.In contrast, the noise parameters associated with the RESET gate appear only when the RESCX gate is applied.
The same behavior can be observed in Fig. 6, where we present the average diamond distance as a function of the number of qubits in the system using 10 6 measurements per local Pauli operator for the probability set 2.
It would be interesting to find the optimal choice of the gates and probability distribution {p k } for learning a given model of noise.We leave this for future works.
C. Deterministic maps simulations

The RESU gate
As mentioned in Sec.III C, we realized the 2-local nonunital channels {E k } of the deterministic maps using a composed gate called the RESU gate.As shown in Fig. 2, the RESU gate on qubits k, k + 1 consists of two 2-local unitaries U k,1 and U k,2 acting on qubits k, k + 1 along with a RESET gate on qubit k: Each of the unitaries U k,i=1,2 is made of a CX gate and two arbitrary single-qubit rotation gates.Therefore, different RESU gates are characterized by four single-qubit rotations R α (ψ), which we denote by {α k , ψ k } 4 k=1 .As in Sec.V B, each single-qubit rotation R α (ψ) is implemented as a product of two X 1/2 and three R z (ϕ i ) gates, as described in Eq. ( 31).
To model the noise in the deterministic map, we used the framework of Sec.IV A. For each unitary U k,i=1,2 we defined its noisy version Ũk,i=1,2 by replacing its noiseless composing gates by their noisy versions.The noisy CX gates in Ũk,i=1,2 were modeled by Eq. ( 27), and the noisy single-qubit rotations R α (ψ) were modeled by Eq. ( 28).Finally, the noisy RESET gate is modeled by Eq. ( 29).Together, the expression of the noisy RESU gate becomes: where N k+1 is an idle noise channel on qubit k + 1 for the duration of the noisy RESET gate.
For the time scale T 0 , as defined in Sec.IV A, we took the maximum running time over all {RESU k } gates.For any RESU k gate with the running time T k < T 0 , a twoqubit idle noise channel N k ⊗ N k+1 is applied for the duration ∆T k = T 0 − T k .On a real QC the idle noise channel is replaced by the delay instruction.

Numerical results
We simulated three deterministic maps, which we refer to as map-1, map-2, and map-3.Each map used two different RESU gates: one for the even layer, and one for the odd layer.In each layer, the same RESU gate was used on all qubits.We note that while the general RESU gate is defined by four single-qubit rotations R α (ψ), the actual gates used in the simulations and experiments only used two single qubits rotations in each gate by setting R α = R β and R γ = R δ (see Fig. 2).This was due to a technical reason: limiting the number of parameters helped us to find sets of angles for which the convergence to the steady state was rapid, at most within 10 steps (as required by the IBMQ hardware).All together, each deterministic map was defined by a set of 4 single-qubit rotations (2 for the even layer RESU and 2 for the odd layer RESU).The full details of the maps, including the rotation axes and angles, as well the different gate times, are described in Appendix C.
The noise model we used in the simulations of the deterministic maps consisted of generalized amplitude damping and non-uniform depolarization.For each qubit this amounted to 5 Lindblad operators: 3 depolarization operators (one for each axis) and two operators for the generalized amplitude damping [38][39][40].In addition, for each qubit we had two parameters modeling its RESET gate according to Eq. ( 29).As in the stochastic case, we assume no coherent errors in the unitary gates.The full description of the noise model is given in Appendix C 2.
After running the simulations, we learned the noise model using the optimization procedure of Sec.IV B. To evaluate the accuracy of the noise parameter estimation, we used the same procedure as in the stochastic maps simulations, and compared the noisy CX and RESET gates used in the simulations to the ones obtained from the optimization routine.
In Fig. 7 and Fig. 8, the average diamond distance between the original noisy CX and RESET gates and their respective estimations obtained through our method is presented.For a system of size n, the average is taken over the n−1 possible RESET gates and 2(n−1) possible CX gates.
In Fig. 7, we present the average diamond distance as a function of the number of shots per local Pauli observable for a fixed system size of n = 8 qubits, using three randomly generated sets of rotation axes and angles, {α k , ψ k } 4 k=1 .Each of the three sets resulted in a different steady state.The accuracy of the estimation is robust to the different sets of rotation angles and axes (which are given in Appendix C 1).For N = 10 6 shots per local Pauli observable, the average diamond distance for the CX gates is approximately 3 × 10 −4 , while for the RESET gates it is approximately 1.4 × 10 −2 .
In Fig. 8 we kept the number of shots per Pauli fixed at N = 10 6 and used the rotation angles and axes from map 2. Increasing the system size from n = 5 to n = 11, we observe that the accuracy of the parameter estimation improves slightly between n = 5 and n = 7, and remains robust up to n = 11.The initial increase in accuracy for smaller systems in Fig. 8 may be explained by examining the light cone shown in Fig. 3 in Sec.III C. For a smaller system size with open boundary conditions, the light cone is truncated at each end of the system, causing the expectation values at the edges of the system to be 3-local instead of 4-local.As a result, the optimization routine uses fewer statistics for the qubits located at the ends of the system.However, for larger system sizes, this boundary effect becomes negligible as the relative fraction of boundary qubits decreases.
As in the case of the stochastic map in Sec.V B, the average diamond distance for the CX gates is much smaller than the RESET gates.This similar behavior arises because our method learns the underlying noise parameters, which may appear in different gates within the circuit.Consequently, the noise channel parameters in the CX gates are present in multiple combinations within a single RESU gate, while the channel parameters of the RESET gate only appear once per RESU gate.
Comparing the numerical results depicted in Fig. 5 and Fig. 6 versus Fig. 7 and Fig. 8, respectively, shows that, for a given number of shots N , the stochastic map achieves greater accuracy compared to the deterministic map.This disparity can be attributed to the stochastic map's additional "prior knowledge" regarding the circuits, given that the gate probabilities p k are known, consequently leading to more accurate results.

VI. QUANTUM DEVICE RESULTS
In this section we describe the experiments we performed on actual quantum hardware to test our method.All experiments were done on an IBMQ machine via the IBMQ cloud.

A. Details of the experiment
For the quantum hardware demonstration we chose the ibm lagos v1.0.32, which is one of the IBM Quantum Falcon Processors [41] with 7 qubits in the shape of a rotated H. Out of the 7 qubits, we chose n = 5 qubits arranged in a line, which we labeled by 0, 1, 2, 3, 4. The IBMQ labels of these qubits were 6, 5, 3, 1, 0 in corresponding order.
We used the deterministic map-1 in Sec.III C to learn the noise of the quantum hardware.See Appendix C 1 for an exact description of the map.Our noise model was the same model used in the deterministic map simulations, which consisted of non-uniform depolarization errors, generalized amplitude damping, and RESET errors (total of 7 free parameters per qubit).In addition to these parameters we added a modeling of coherent CX errors as free parameters in its Hamiltonian -see Appendices A, C 2 for a full description of the model.
After learning the hardware noise model using map-1, we ran deterministic map-2 and measured 2-local Pauli expectation values on its steady state.We used these measurements to test the quality of the noise model we learned from map-1.
To reduce SPAM errors, we used a readout-error mitigation scheme, which was partially based on Refs.[42,43].The scheme relied on a preliminary step in which we estimated the conditional probability of measuring a 4local bit string in the computational basis, assuming the qubits were prepared in a different computational basis.See Appendix D 1 for an full description of our scheme.
Overall, our experiment consisted of three steps: (i) preliminary readout-error mitigation measurements, (ii) running deterministic map-1 and measuring the expectation values of 4-local Paulis, and (iii) running deterministic map-2 and measuring the expectation values of 2-local Paulis.
In all three parts of the experiment we needed to estimate the expectation values of local Pauli strings.
To that aim, we used the overlapping local tomography method, described by Zubida et al. in Ref. [8], which simultaneously estimates all geometrically k-local Pauli strings using a set of 3 k different measurement circuits.This means that using a total budget of M shots, each Pauli string was estimated using M/3 k shots.
We now describe the 3 parts of the experiment in more details.
(i) Readout error mitigation measurements: Here we performed preliminary readout-error mitigation measurements for the protocol that is described in Appendix D 1.Our measurements were used to mitigate the readout errors of steps (ii),(iii).We used the overlapping local tomography method (see Zubida et al. in Ref. [8]), to obtain 4-local readout statistics from a set of 2 4 different measurement circuits of unit depth.On each circuit we performed 2 × 10 5 measurements (about M = 3 × 10 6 shots in total).
(ii) 4-local channel measurements: In this step we ran the deterministic map-1 on the quantum computer and prepared many copies of the steady states of E I and E II channels.Following the numerical simulations, we assumed that the steady state of each channel was approximately reached after 10 steps.We then used the overlapping local tomography to estimate the expectation values of the 4-local Pauli operators using 1.25 × 10 5 measurements for every 4-local expectation value (and about M = 2 × 10 7 shots in total).Upon finishing the measurements in (ii), (iii), we used the results of readout measurements (i) to correct these measurement statistics (see Appendix D 1 for details).

B. Results I: Noise model validation
The main application of our method is to validate (benchmark) a noise model of a QC by calculating its local cost functions Φ q given a noise model that defines the noisy channels { Ẽk }, as explained in Sec.III C. The local cost function Φ q for a qubit q is the sum of all the A j entries in Φ I and Φ II (see Eq. ( 23)) that overlap qubit q, divided by the suitable normalization factor C Sq (see Eq. ( 10)).This happens when the light cone of A j includes qubit q, i.e., when q ∈ {j − 1, . . ., j + 2} (see Fig. 3 for an illustration).
We used our validation protocol to compare three noise models: (i) the ideal noiseless model, (ii) our estimate of the IBMQ Qiskit [44] noise model, imported from the device's backend properties at the time of our experiment, and (iii) a noise model that we learned from the IBMQ hardware using our optimization method (see Sec. IV).
To calculate model (ii), we needed to know E * odd , E * even of the IBMQ Qiskit noise model, from which the local cost function is calculated (see Eqs. (21,22,23)).However, as far as we could tell, IBMQ does not fully publish the exact formulas it uses to model the local noise.Instead, it publishes general noise parameters such as T 1 , T 2 , and provides code that simulates noisy cir-cuits.Therefore, in order to calculate E * odd , E * even , we first learned IBMQ Qiskit noise model using the deterministic map-3, and used the resultant model as an approximation to IBMQ Qiskit noise model.In more details, we used IBMQ Qiskit simulator to simulate the steady state of map-3, and calculated local Pauli expectations with vanishing statistical noise.We then used our optimization routine to learn Qiskit's noise model.We used the parameters described in Appendix C 2, but without any coherent errors, and with depolarization only along the ẑ axis.As a sanity check, we used the model we learned to simulate the 5 qubits steady states of maps-1,2,3 and compared them to Qiskit's steady states.We noticed that in all cases the trace distance between our steady state and Qiskit's steady state was less than 1%.
Fig. 9 shows the single-qubit cost function, as a heat map.The different rows represent log 10 [Φ q ± ∆Φ q ], for the 3 noise models, and ∆Φ q is the uncertainty of one standard deviation in the cost function (see Appendix E for detailed error analysis of the experiment).Unsurprisingly, it suggests that IBMQ Qiskit's noise model provides a better description of the hardware than the ideal, noiseless channel.It also suggests that our estimated noise model performs significantly better than the two other models.This might not be surprising, given the fact that we learned our noise model using the same steady state that we used to validate it.In other words, the learned noise model parameters are the ones that minimize the global cost function, which is roughly the sum of the local cost functions that appear in Fig. 9.In the machine learning jargon, this might imply that our noise model estimation procedure overfitted the particular steady state from which the 4-local measurements were sampled.To rule this out (at least to some extent), we show in the next subsection that our estimated noise model has a better predictive power than the other two: it is able to better approximate the 2-local expectation values of a steady state of map-2, which is different from the steady state of map-1 with which we learned the noise model.

C. Results II: Noise model characterization
In this section we present the results of applying our optimization algorithm to characterize the noise within a quantum device.The full learned noise model parameter details are presented in Appendix D 2, and the statistical error analysis is presented in Appendix E.
To evaluate the quality of our noise characterization, we took the noise model that we learned and used it to numerically simulate the steady states of map-2 (of E I and E II ).On these steady states we calculated the expectation values of 2-local Paulis in the bulk of the 5 qubits line (we used qubits 1, 2 for E I and 2, 3 for E II ).We then compared these simulated expectation values to the actual results we measured on the quantum hardware when running map-2 (part (iii) of our experiment described in Sec.VI A).We used the same procedure to test the two other noise models from the previous section, i.e., the ideal (noiseless) model, and IBMQ Qiskit's model.In contrast to the previous section, to sample the 2-local Paulis from the IBMQ Qiskit noise model we don't need to know E * odd , E * even .Hence, we calculated the 2-local Pauli expectation values directly from the IBMQ Qiskit simulator using the noise model imported from the device's backend properties at the time of our experiment.
Our results are presented as histograms in Fig. 10, where we show the expectation values of the 9 possible 2-local Pauli expectation values for the steady states of E I , E II of map-2.The expectation values from the quantum hardware are shown in red.As we used more than 10 5 measurements for each data point, the statistical error in the red bars in Fig. 10 is smaller than 0.003, which is negligible comparing to the differences with respect to other models.For the 3 analytical models, we present the exact expectation values, without any statistical error.Our model prediction is shown in blue, Qiskit's model in yellow and the noiseless model in green.As can be seen in the figure, while the predictions of all the models are in decent agreement with the empirical results, our model is in better agreement than the other two models.
To compare our estimate in a quantitative way, we used the 2-local Paulies expectation values of the IBMQ hardware to perform quantum state tomography and estimate the underlying 2-local reduced density matrices (RDMs) in the steady states of E I , E II of map-2.We compared it to the RDMs produced by simulations of our model, the Qiskit model and the ideal model.For each case, we calculated the trace distance D(ρ, σ) def = 1 2 ∥ρ − σ∥ 1 between the simulation and the estimated RDM of the IBMQ hardware.The results are given in Table I, showing that indeed the trace distance between IBMQ RDMs and our model RDMs are 50% − 70% of the distances from ideal model or the Qiskit model.
It is worth noting that the 2-local Pauli operators used for the reliability assessment were collected after the 4local Pauli expectation values used for noise model characterization were obtained.As a result, the noise model at the time of the 2-local Pauli measurements may have undergone some drifting.This may account for some of the variations seen in the 2-local expectation values depicted in Fig. 10 [45-47].

VII. SUMMARY AND OUTLOOK
We have presented and demonstrated numerically and experimentally a new framework for learning a non-unital quantum channel from its steady state, which can be used to learn the noise channel of a quantum computer.It is inspired by the recent methods of learning a local Hamiltonian or Lindbladian from their steady states [24][25][26][27].
We have shown that when the underlying noise channel in a quantum computer is Markovian and local, it can  TABLE I.The trace distance of the 2-local reduced density matrices obtained from the IBMQ hardware using quantum state tomography and the corresponding density matrices taken from simulations using the ideal model, Qiskit noise model, and our learned model.We calculated the distances on the steady states of EI and EII of map-2.
be learned using our method by driving the system into an engineered dissipative steady state.We have shown that there exist a set of linear constraints between the expectation values of local observables in the steady state, which can be used to learn or validate the underlying noise channel.This steady state can be a correlated, entangled state, and might even be classically inaccessible.
Nevertheless, as we showed, the constraints between its local expectation values are local and can be efficiently verified once we measure these expectation values.We can therefore apply our method to a system with a number of qubits and test how well a given noise model globally describes the system, which is in a state that might be classically inaccessible.
For generic dissipative maps, the steady state is unique, and consequently, our method is independent of the initial state and is insensitive to state preparation errors.In addition, the fact that we measure a steady state that was created using several iterations allows us to test possible deviations from Markovianity.Finally, by optimizing over a parametrized set of local Markovian noise models, our method allows learning of the optimal noise model.
We have suggested two generic ways to engineer a dissipative map on a quantum computer, both of which rely on non-unitary operations, such as the RESET gate.The first is a stochastic map that is implemented on a quantum computer by applying a set of gates according to a prescribed probability distribution.The second uses a brick-wall type circuit, whose building blocks are composed 2-local non-unitary gates and can therefore be applied using a single circuit.We have demonstrated numerically both methods using simulations of 5−11 qubits, and showed that it can successfully learn an underlying noise model.In our numerical tests we managed to learn the underlying CNOT and RESET gates up to diamond distances of 10 −3 − 10 −5 and 10 −1 − 10 −2 respectively, using 10 4 − 10 6 shots per observable.
We have also tested our method experimentally on the IBMQ ibm lagos machine using 5 qubits.We have shown that given a noise model, we can construct a 'heat map' view of the qubits showing how well the model describes each qubit.We have also showed how our method can learn an underlying noise model for that machine, and that the model it learned better predicted the outcomes of another experiment than the noiseless model or IBMQ Qiskit's model.
Our work leaves open several theoretical and practical questions.On the theoretical side, it would be interesting to characterize the complexity of the steady states of dissipative maps like the one we used.Can they be sampled classically, or can they encode some BQP-hard problems?More general engineered dissipative maps are known to be BQP-complete [48], but our maps are much simpler.They also differ from the much studied class of randomly generated circuits, for which there are several complexity bounds (see, for example, the recent result in Ref. [49] and references within), in several aspects: our maps are not completely random (we apply the same map over and over), they are strongly dissipative and non-unital, and can be applied for poly(n) time.
It would also be interesting to find a systematic way of engineering the dissipative maps (of both types) to have an optimal estimate of the error parameters.This might be possible to do by studying the quantum Fisher information [50] of the steady states with respect to the free parameters of the map.Related to that, it would be interesting to understand how the quantum Fisher information, or more general our ability to use the steady state for learning, depends on the deviation of the channel from unitality.Is there a phase transition behavior, as seen in related measurement induced phase transitions literature (see Ref. [51] and references within), or is it a smooth transition?
On the more practical direction, it would be interesting to see if there is a way to make our method completely insensitive to SPAM errors, by removing its dependence on measurement errors.The readout error mitigation procedure that we employed arguably removes a large part of these errors, but not in a controlled, systematic way, as done in methods like RB [52][53][54][55] and GST [56][57][58].
dephasing and amplitude damping, which are given by the Lindbladians where In addition, we used θ 0 , θ 1 to model the noisy RESET gate, as defined in Eq. ( 29).
We note that in all the simulations of the stochastic maps, we assumed that the CX gate had no coherent errors; all its errors came from the single-qubit errors of its two-qubits.
For every qubit, the dimensionless noise parameters θ z , θ amp that we used in our simulations were randomly picked between the values of 0.01 and 0.07.In terms of the coherence times (given by T 0 /{θ m }), our choices correspond approximately to a range of 20 [µsec] to 150 [µsec].
The parameters for the noisy RESET were randomly generated between the values 0.87 and 0.98.Following the discussion in Ref. [35], we required θ 0 ≥ θ 1 to match the typical values on the IBMQ hardware.However, we note that the experimentally determined noise parameters of the RESET gates did not follow this assumption.

Gate durations
The noisy versions of all single-qubit unitary gates were simulated using Suzuki-Trotter expansion, as outlined in Eq. (28).For the noisy version of the CX we used Eq. ( 27) with the Hamiltonian defined in Appendix A with vanishing coherent errors.
To define the gate channels in the stochastic map, we used the execution times that resemble the execution times of actual quantum gates in superconducting qubits [33,34].For the RESET gates, duration times were generated between 0.8 [µsec] and 1.1 [µsec].For the CX gates, durations were generated between 0.3 [µsec] and 0.45 [µsec].The duration time of the X 1/2 gate was set to 15 [nsec], and for the H gate the duration was set to 20 [nsec].As stated in Sec.V B, the R α (ψ) gate is a product of two X 1/2 and three R z (ϕ i ) gates.The R z (ϕ i ) gates are implemented on the superconducting platforms, such as IBMQ, with zero error and duration [33].Therefore, we took the duration of the general single-qubit rotation gate R α (ψ), defined in Eq. (31), to be twice the duration of the X 1/2 gate.
In our simulations, the time scale T 0 was dictated by duration of the RESCX gate, and was roughly 1.5 [µsec].We used the deterministic map both in numerical simulations and real quantum hardware.Consequently, it was important that the maps we defined will converge to the fixed point in at most 10 steps, which was the maximal number of steps that we could run of these types of circuits on the IBMQ hardware.To that aim, we first ran few numerical simulations to find a combination of rotation angles that will guarantee fast convergence.
Recall from Fig. 2 that the RESU gate is defined by 4 single-qubit rotations, which amounts to 12 rotation parameters per RESU gate (3 parameters for every singlequbit rotation).To facilitate the search for optimal rotation angles (which will give rapid convergence), we reduced this number by half by using the same parameters in the two single-qubit rotations in U 1 (the 2-local unitary before the RESET gate), and similarly in U 2 (the 2-local unitary after the RESET gate).In other words, we set R α = R β and R δ = R γ .It's worth mentioning that even when each RESU gate utilizes 12 rotation parameters, convergence is still achieved within a few tens of steps.However, due to classical hardware constraints on IBMQ machines, we were compelled to approximate the steady state within a maximum of 10 steps.
The parameters we used for map-1,2,3 are presented in Table III

Noise model in the deterministic case
In the deterministic case, we used a more expressive noise model than was used in the stochastic case, since we wanted the same model to describe also the real quantum hardware experiments of Sec.VI.For each qubit, we used 7 noise parameters: θ x , θ y , θ z , to model depolarization in the x, ŷ, ẑ directions, θ amp , θ ex to model general amplitude damping noise, and θ 0 , θ 1 to model the noisy RESET gate (as was done in the stochastic case).
The depolarization noise channels in the different directions were modeled by the Lindbladians For the generalized amplitude damping channel [38,39], we used the Lindbladian Physically, the θ amp models the strength of the general amplitude damping process, and θ ex models the occupation of the qubit's state |1⟩ at thermal equilibrium.The simple amplitude damping model is described by L amp,+ , which corresponds θ ex = 0, where the equilibrium state of the qubit is assumed to be |0⟩⟨0| (which correspond to a qubit at zero temperature).
In the numerical simulations we chose the dimensionless noise parameters {θ x , θ y , θ z , θ amp , θ ex } randomly between the values of 0.01 and 0.1.The θ 0 , θ 1 parameters for the noisy RESET were randomly generated between the values 0.87 and 0.98, as in the stochastic case.
In our numerical simulations we have used several randomly generated gate durations using the same bounds as for the stochastic map, given in Appendix B 3. The resulting time scale T 0 was approximately 2 [µsec].For quantum readout error mitigation (QREM), we followed the approach of Ref. [42], but adapted it to k-local readout statistics obtained using the overlapping local tomography from Ref. [8].The method empirically evaluates the conditional probability distribution P (x|y) of measuring x = (x 1 , . . ., x k ) classical bit string in standard basis after the system is prepared the quantum state |y⟩⟨y| for y = (y 1 , . . ., y k ).In our case, we used k = 4 for learning the noise channel (part (ii) in Sec.VI A) and k = 2 for testing our characterization (part (iii) in Sec.VI A).In the ideal case, P (•|•) is a Kronecker delta-function.Upon obtaining the P (•|•), we used it to 'fix' an empirical probability P noisy (x) of measuring x in the computational basis, as follows.Assuming that P noisy (x) = y P (x|y) • P ideal (y), we extracted P ideal (x) by P ideal (x) = The readout error mitigation process precedes the numerical optimization, as outlined in Sec.IV B, where a matrix P (x|y) with O(n 2 ) values is inverted with preprocessing computational cost of O(n 3 ).In Table V, we give the estimated values of the dimensionless noise (coupling strength) parameters {θ x , θ y , θ z , θ amp , θ ex }, which are described in Appendix C 2. The values of 10 −5 and 0.1 are the lower and the upper bounds on the parameters in the numerical optimization, at which the gradient with respect to the specific parameter is forced to zero as an overall stability measure.
The estimated values of the θ 0 , θ 1 parameters of the noisy RESET gates are given in Table VI (see Eq. ( 29) in Sec.IV A).In the optimization, we used 0.99 as an upper bound to θ 0 , θ 1 .We note that for the RESET gate on the qubit 4 the optimization was not performed.This is because in the definition the RESU gate (see Fig. 2) the RESET gate acts on the first qubit in the pair, and so we did not use RESET on qubit 4.
In Table VII, we give the estimated values of CX gate noise parameters, representing the coherent errors in the two-qubit gate (see Eq. (A1) in Appendix A).We note that the θ 3 parameter in the gates CX(0 → 1), CX(1 → 2), CX(2 → 3), CX(3 → 4) was not affected by the optimization (i.e., the gradient with respect to θ 3 vanished for these gates), and remained in its ideal inital value of 1.0.In a hindsight, this can be explained by the fact that θ 3 is equivalent to adding a ẑ rotation after CX.In the gates CX(0 → 1), CX(1 → 2), CX(2 → 3), CX(3 → 4), this rotation is followed by a RESET gate, which is insensitive to phases in the standard basis, and as a result the θ 3 parameter does not affect RESU gate (and therefore it does not affect the steady state).
The notable fluctuations in learning precision across different families of noise parameters, such as between single qubit depolarization parameters and two-qubit CX   gate parameters, can be attributed to the differences in the gradients of the cost function Φ with respect to the parameters of the CX and RESET gates compared to single qubit parameters.This difference arises from the fact that the quantum circuit implementing the deterministic map primarily comprises CX gates along with RESET gates.The single qubit parameters come into play mostly during idle states of the qubit or during the action of single qubit rotation gates.As a result, the influence of single qubit parameters on the cost function is smaller, leading to lower learning accuracy for these parameters in presence of the statistical noise.
Appendix E: Bootstrapping method and statistical error analysis via error propagation

Bootstrapping method assuming independent sampling
To estimate the statistical errors in the cost function Φ and the fitted parameters θ we applied a simple bootstrapping method [64].Given a set of Pauli expectation values obtained from the experiment using N statistical shots, we augmented each expectation value by adding a realization of a Gaussian random variable with zero mean and variance of 1/N .Subsequently, our optimization routine was executed using these "noisy" Pauli expectation values as input.All other hyperparameters, such as the optimization step size, number of iterations, and stopping conditions, remained consistent.
By employing this approach, the error bars on the fitted parameters, as illustrated in Table V, Table VI, and Table VII, were determined as one standard deviation of the fitted parameters acquired from 150 separate realizations of Gaussian noise applied to the expectation values obtained from the experiment.Consequently, the error in the cost function Φ, presented in Fig. 9, for a fixed set of parameters θ, was likewise calculated as one standard deviation of the cost function values derived from 150 distinct realizations of Gaussian noise affecting the expectation values obtained from the experiment.
The primary assumption underlying this method hinges on the notion that each Pauli expectation value was independently sampled.In practice, however, measurements were conducted utilizing the overlapping local tomography method (as elucidated in Section VI A), wherein all geometrically k-local Pauli strings are simultaneously estimated using a set of 3 k distinct measurement circuits.Consequently, while considering Pauli expectation values as random variables, it's important to acknowledge that these variables may indeed exhibit correlations.Nonetheless, conducting a comprehensive analysis of such correlations would necessitate measuring k + 1-local Pauli strings, a task that would substantially increase the execution time on the IBMQ machine.
Below, we provide an analytical description of the er-rors in both the cost function and the parameters, assuming independent sampling.Additionally, we explore a scenario where independent sampling is not assumed, leading to an upper bound on the error bars associated with the estimated parameters.

Statistical error analysis
The errors in cost function Φ can be estimated analytically using simple error propagation analysis.From the functional form of the cost function in Eq. ( 12), we get that for a fixed set of parameters θ, the error in Φ due to statistical errors in the estimation of ⟨P i ⟩ is given by where we assumed that the Pauli operators were sampled independently.
To derive the analytical expression for the errors in the fitted parameters, recall from Eq. ( 12) that the cost function is given by In order not to overload the notation, we shall drop the ∞ subscript from ⟨P j ⟩ ∞ , and in addition we will use ⟨P ⟩ to denote the vector of all ⟨P j ⟩ expectation values.
Let then ⟨P j ⟩ denote the exact expectation value of the Pauli operator, and by Pj the empirical expectation value we obtain using N independent shots.Then we may write Pj def = ⟨P j ⟩ + δP j , where δP j can be modeled by a Gaussian variable with zero mean and standard deviation ∆P j = ⟨(δP j ) 2 ⟩ = (1 − ⟨P j ⟩ 2 )/N ≤ 1 √ N .During the optimization step, we find θ that minimizes Φ at the set of empirical expectation values {⟨ Pj ⟩}.This is of course a random variable by itself, which fluctuates around the exact minima that we would have obtained had we minimized Φ at {⟨P j ⟩}.Let θ 0 denote the exact minima, then we define the random variable δθ by θ def = θ 0 + δθ.Finally, define J α to be the gradient of the cost function: J α = ∂Φ ∂θα .Then 0 = J α (θ; P ) = J α (θ 0 + δθ; ⟨P ⟩ + δP ) (E2) By definition, also J α (θ 0 ; ⟨P ⟩) = 0, since it is the minimum of the cost function at the exact expectation values, and therefore we get a linear equation for δθ, whose solution is where the matrix M is given by and H is the Hessian of the cost function, The matrix M is evaluated at (θ 0 , ⟨P ⟩), and is therefore not a random variable.Taking the δP j to be independent Gaussian variables with variance (∆P j ) 2 , we may estimate In practice, we do not know M exactly at (θ 0 , ⟨P ⟩), but as a zero order approximation, we can use its value at the empirical values (θ, P ).
We note that the bound in Eq. (E4) was derived by assuming that {δP j } are independenet Gaussian variables, which is justified when using independent samples to estimate different expectation values.When this is not the case, one can use the weaker bound which does not assume statistical independence, and in fact provides the worst-case bound on the error which can be attributed to the case where all Pauli expectation values are correlated.
During the analysis of the error, we computed the error bars for the fitted parameters using two approaches: the bootstrapping method assuming independent sampling (refer to Appendix E 1) and the upper bound derived from Eq. (E5).However, we find that the calculated upper bound offers limited insight into the actual errors in the estimation due to its substantial magnitude.To illustrate this point, we compare the errors obtained from the bootstrapping method described in Appendix E 1 with those derived from Equation Eq. (E5) for the CX(i → j) gates positioned in the middle of the five-qubit system utilized in our experiment.
This comparison of error estimates is presented in Table VIII.Notably, the bound provided by Equation Eq. (E5) greatly exceeds the estimate generated by our bootstrapping method.However, if the actual errors in the fitted parameters were indeed close to the worst-case bound, our method would likely fail to accurately reproduce the dynamics of the learned quantum channel.In such a scenario, we would not observe a satisfactory agreement between the estimated and experimentally obtained 2-local Pauli expectation values, as demonstrated in section VI C, particularly in Figure 10.Therefore, our findings suggest that the independence assumption is more likely to approximate the actual error propagation scenario than the upper bound.Comparison between the error estimates ∆θi in the fitted parameters θi for the CX(i → j) gates located in the middle of the five-qubit system used in the experiment.The bootstrapping column is taken from Table VII, and corresponds to the error analysis given in Appendix E 1.The worst-case column is the error bound obtained from Eq. (E5).

FIG. 1 .
FIG.1.Block diagram summarizing our method.Given a description of the non-unital channel E(•) (e.g. its quantum gates, activation probabilities, etc), we iteratively apply it on a QC to reach an approximate steady state.The local Pauli expectation values are sampled from the steady state, and used as input to the classical post-processing routine.They satisfy a set of local constraints with coefficients that depend on the noise parameters of the underlying quantum device.The local constraints can be verified efficiently and their number grows linearly with the system size.The classical post-processing routine is readily tailored to learn specific channel properties such as rotation angles, coherence times, gate activation probabilities and so on.
FIG. 3. A light cone originating from a 2-local observable Aj (red shaded area) has at most 4-local support.For the case of finite system with open boundary conditions the light cone is truncated from 4-local to 3-local at each end of the system.

FIG. 5 .
FIG. 5.The average diamond distance between the original noisy CX and RESET gates and their respective estimations, for a fixed system size n = 6 qubits.Each data point being the average of 10 statistical noise realizations in the expectation values.The error bars indicate the uncertainty of two standard deviations.(Top): The average diamond distance for the CX gates.(Bottom): The average diamond distance for the RESET gates.Each color represents a different probability set {p k }.The constant horizontal line (red color) represents the diamond distance between the ideal case of the noiseless channel and the true noise channel parameters.

FIG. 6 .
FIG. 6.The average diamond distance between the original noisy CX and RESET gates and their respective estimations, for a fixed number of shots N = 10 6 , using the probabilities {p k } from the set 2. Each data point being the average of 10 statistical noise realizations in the expectation values.The error bars indicate the uncertainty of two standard deviations.

FIG. 7 .
FIG. 7. The average diamond distance between the original noisy CX and RESET gates and their respective estimations, for a fixed system size n = 8 qubits.Each data point being the average of 10 statistical noise realizations in the expectation values.The error bars indicate the uncertainty of two standard deviations.(Top): The average diamond distance for the CX gates.(Bottom): The average diamond distance for the RESET gates.Each color represents a different map defined by the rotation axes and angles, {α k , ψ k } 4k=1 .For a specific map, the pairs {α k , ψ k } were randomly generated, and kept constant.The constant horizontal line (red color) represents the diamond distance between the ideal case of the noiseless channel and the true noise channel parameters.

FIG. 8 .
FIG. 8.The average diamond distance between the original noisy CX and RESET gates and their respective estimations, for a fixed number of shots N = 10 6 , using the rotations {α k , ψ k } from the map 2. Each data point being the average of 10 statistical noise realizations in the expectation values.The error bars indicate the uncertainty of two standard deviations.

(iii) 2 -
FIG. 9.The cost map for the ibm lagos device for three noise models: the ideal noiseless case (top row), the learned noise model from IBMQ Qiskit backend (middle row), and the learned noise model using our method (bottom row).The top number in each square corresponds to log 10 [Φq + ∆Φq], and the bottom number to log 10 [Φq − ∆Φq].Where Φq is cost function for the qubit q, and ∆Φq is the uncertainty of one standard deviation in the cost function value due to the statistical noise present in the measurements of the local Paulis.The full details of the derivation for the ∆Φq are given in the Appendix E. Each square is colored according to the value of log 10 [Φq].

FIG. 10 .
FIG.10.Expectation values of the 2-local Pauli operators obtained from: the ibm lagos device (red), the learned noise model using our method (blue), the IBMQ Qiskit simulator (yellow) and the ideal noiseless case (green).We focus on qubits 1, 2 for EI (top) and 2, 3 for EII (bottom) to avoid including the boundary qubits 0 and 4, for which the measurement statistics become 3-local instead of 4-local (see Sec. V C).See TableIfor the corresponding trace distances between the measured RDM and the RDMs of the three models.

Appendix C: Details of the deterministic map 1 .
Rotation angles in the RESU gates

Appendix D: Details of the quantum device noise model characterization results 1 .
Details of the readout error mitigation scheme for k-local Paulis

2 .
Estimated noise parameters of the ibm lagos v1.0.32 device The experimental results presented in Sec.VI B and Sec.VI C were obtained from the ibm lagos v1.0.32 quantum device on the IBMQ platform on September 11, 2022.The time scale based on the data from the device backend was T 0 = 1.792 [µsec].

TABLE III .
and TableIVfor the even and odd layers, respectively.Rotation angles used to define the RESU gates in the three simulated deterministic maps, for the even layer.The angles are given in radians.

TABLE IV .
Rotation angles used to define the RESU gates in the three simulated deterministic maps, for the odd layer.The angles are given in radians.

TABLE V .
Estimated values of the dimensionless noise parameters.The error values represent one standard deviation and are determined using the bootstrapping method outlined in Appendix E 1. Entries without an error estimate correspond to "run away" optimization parameters, which consistently saturated the allowed parameter range [10 −5 , 10 −1 ] in the optimization for all instances in the bootstrapping ensemble.

TABLE VI .
Estimated values of the RESET gate parameters.The error values represent one standard deviation and are determined using the bootstrapping method outlined in Appendix E 1. Entries without an error estimate correspond to "run away" optimization parameters, which consistently saturated the allowed upper parameter bound of 0.99 in the optimization for all instances in the bootstrapping ensemble.Note that for the qubit 4, the RESET gate was never applied, and therefore, the relevant parameters were not estimated.

TABLE VII .
Estimated values of the CX gate parameters.The notation CX(i → j) corresponds to the gate with control qubit i and target qubit j.The error values, representing one standard deviation, are determined using the bootstrapping method outlined in Appendix E 1. Error values of precisely 0 for θ3 parameter in the gates CX(0 → 1), CX(1 → 2), CX(2 → 3), CX(3 → 4) are a result of the vanishing gradient, as discussed in Appendix D 2.