Engineered dissipation to mitigate barren plateaus

Variational quantum algorithms represent a powerful approach for solving optimization problems on noisy quantum computers, with a broad spectrum of potential applications ranging from chemistry to machine learning. However, their performances in practical implementations crucially depend on the effectiveness of quantum circuit training, which can be severely limited by phenomena such as barren plateaus. While, in general, dissipation is detrimental for quantum algorithms, and noise itself can actually induce barren plateaus, here we describe how the inclusion of properly engineered Markovian losses after each unitary quantum circuit layer can restore the trainability of quantum models. We identify the required form of the dissipation processes and establish that their optimization is efficient. We benchmark our proposal in both a synthetic and a practical quantum chemistry example, demonstrating its effectiveness and potential impact across different domains.


I. INTRODUCTION
While dissipation is generally detrimental to quantum technologies and is, in fact, a limiting factor for currently available noisy quantum devices [1], there are important exceptions.On the one hand, going beyond unitary operations is not only needed to account for quantum measurements [2] but also enables several applications ranging from quantum state preparation and control to error correction [3].On the other hand, noisy quantum platforms are well suited for computing [4] in a hybrid quantum-classical setting, for instance using the popular class of variational quantum algorithms (VQAs) [5].In this framework, a parametric quantum circuit is used to prepare complex quantum states and to evaluate, through sampling or with suitable quantum measurements, a cost function that would be otherwise expensive to compute classically.The optimization of the circuit parameters, aimed at minimizing such cost, is instead assigned to a classical optimizer.As an example, variational quantum eigensolvers (VQEs) [6,7] can be used to approximate the ground state of a given Hamiltonian H through a trial quantum circuit, also called an ansatz, by minimizing the cost function given by the expectation value of H.More in general, VQAs can be applied to classical optimization problems [8], linear systems of equations [9][10][11], quantum simulations [12][13][14][15], quantum data compression [16], quantum machine learning [17][18][19], generative models [20][21][22], quantum foundations [23], quantum compiling [24][25][26][27], and quantum error correction [28].In many of these cases, the VQA can also be formulated as a ground-state problem with a problemspecific (rather than a physically motivated) Hamiltonian, the choice of which is generally non-trivial [5].
Despite their potential advantages, VQAs are known to suffer from a number of bottlenecks that still prevent their implementation at scale.One particularly serious drawback, specific to quantum operations, is the phenomenon known as barren plateaus [29].These manifest as a vanishing gradient of the loss function with re-spect to the model parameters and can lead to severe limitations in the training efficiency.Barren plateaus build up as the system size increases, hence directly hindering the scalability of VQAs.Specifically, it was found that when a quantum circuit ansatz reaches the 2-design limit the probability of randomly inducing a non-negligible gradient decreases exponentially with the number of qubits [29].This condition is relatively easy to satisfy [30][31][32] implying a generalized loss of all possible quantum speedups, even if the optimization strategy does not include gradient computation [33].
Recent findings indicate that barren plateaus can originate from several factors, including high circuit expressibility [34,35], concentration of the cost function [36], noise [37], entanglement excess [38], and globality of the cost function [39,40].All these barren-plateau sources can be unified by means of a Lie algebra theory that applies to a general (yet unitary) setting [41,42].Several techniques to mitigate barren plateaus have been reported, such as initialization strategies [43,44], transferability of smooth solutions [45], entanglement limitations [46], correlations and restrictions of circuit parameters [34,47], classical shadows [48], pre-trainings [49,50], and layerwise learning [51,52].These mitigation strategies generally consist in appropriately constraining the unitary ansatz.In this work, we change the focus acting on the problem Hamiltonian without adapting the quantum circuit.Remarkably, the occurrence of barren plateaus is closely related to the unitarity of the ansatz [41,42], while assessing these phenomena beyond a fully unitary framework presents a promising research avenue.Our goal is therefore to demonstrate the potential of nonunitary, open-system dynamics as a powerful strategy to overcome the trainability barrier and to ensure efficient convergence.To achieve it, it is reasonable to expect that a non-trivial engineering of the dissipation processes will be required.Indeed, while it has been already suggested that non-unitary operations can increase the accuracy of ground-state VQE calculations in quantum chemistry [53,54], generic noise is known to actually induce barren plateaus [37,55].In parallel, dissipation implemented simply by discarding qubit registers, as in the class of socalled dissipative quantum neural network models [56][57][58], is also not sufficient to ensure trainability, in general [56].The proof that a suitable set of operations can in fact be constructed constitutes the main technical contribution of our work.
Building on previous results obtained in the field of engineered quantum dissipation [59,60], we consider a Markov dissipation modeled by a Gorini-Kossakowski-Sudarshan-Lindblad (GKLS) Master Equation [61][62][63].Our strategy to design the proper dissipation is motivated by the recent observation that training a local cost function (a cost function made of local observables) is much more efficient than training a global one [64][65][66][67].Rigorous analyses have proven that local cost functions, unlike global ones, are immune to barren plateaus in the case of shallow circuits [39,40].While formulating variational algorithms locally is preferable, mapping a global problem to a local one is generally a non-trivial task.Here, we propose variational algorithms based on nonunitary ansatzes and we show that this represents an effective strategy to tackle barren plateaus (Sec.III), setting the theory framework (Sec.IV) and discussing both a synthetic and a chemical example (Sec.V and Sec.VI).

II. GENERAL FRAMEWORK
In the most general case, VQAs consist of minimizing a cost function whose minimum faithfully corresponds to the solution of a considered problem [5].In the following, we consider the case where the cost function corresponds to the expectation value of an n-qubit Hermitian operator H and, consequently, the solution is its ground-state energy.Given an initial condition ρ in and a quantum circuit ansatz U (θ), where θ is a free parameter vector, the cost function has the form The minimization strategy consists of evaluating C(θ) by a quantum device.The free parameters are optimized by a classical procedure.
A necessary condition for the circuit trainability is that, for a random initialization of θ, the probability of finding a non-negligible value of the cost function derivative with respect to a generic parameter θ k , denoted as ∂ k C, is itself not negligible.An upper bound on this probability is known to be proportional to the variance of the derivative, which we write as Var[∂ k C], setting the limits to training efficiency [29,[34][35][36][37][38][39][40][41][42].By definition, we say that the cost function landscape presents a barren plateau if Var[∂ k C] exponentially decreases with the number of qubits n, i.e., Var[∂ k C] ∈ O(e −pn ) where p is a positive integer.
This phenomenon strongly depends on the locality of H where locality, in this context, refers to the number of qubits on which the Hamiltonian acts non-trivially.We expand the Hamiltonian such that where which means that the probability of finding a nonnegligible gradient does not decrease faster than a polynomial implying the absence of barren plateaus.
It has also been empirically found that when a problem can be encoded with both a global and a local cost function, the training is more effective in the local case and the mitigation techniques work much better [34,[64][65][66].Moreover, it has been also analytically shown that Var[∂ k C], in general, increases with the Hamiltonian locality [41,42].In the following, we will focus on the case where the Hamiltonian of the problem is originally global and an equivalent local one is unknown.We will show how the addition of a proper non-unitary layer in the variational scheme allows the problem to be approximated with a local one where barren plateaus are absent or easier to face.

III. NON-UNITARY ANSATZ
We propose to generalize the unitary framework for VQA to Markovian maps, designing the proper dissipation [59,60] amenable to experimental implementation, also in near-term quantum devices.Previous non-unitary ansatz in VQA's have been considered for in silico implementations (classical post-processing) to boost precision in chemical problems [53,54].The non-unitary ansatz acting on the quantum state is where U (θ) is, as before, a parametric quantum circuit, and E(σ) is a non-unitary superoperator, with their respective tunable parameters θ and σ.Under Markovian assumption, the non-unitary part of the ansatz is defined by a parametric Liouvillian L = L(σ) such that where ∆t is the interaction time with the environment.The Liouvillian L is the superoperator that generates the dynamics of the GKLS Master Equation [61][62][63]: where H is the Hamiltonian responsible of the unitary part of the evolution, {γ i } are the damping rates and the operators {L i }, called jump operators, identify the environment action on the qubits.For our specific proposal, we set the following conditions on L: 1. L can be expressed as the sum of Q superoperators L q (σ q ), each of which acts non-trivially on at most K qubits.This expansion takes the form: 2. All the generators L q commute with each other.
3. All the generators L q have exactly one stationary state, ρ ss,q .
4. The generators L q converge to their respective stationary states at the same rate, which we refer to as the mixing time.
According to the previous definitions and under the assumptions 1. and 2., the cost function reads:

A. Illustrative example
Before delving into the formal characterization, we present a simple example to illustrate the strategy of tackling the barren plateaus with non-unitary VQAs.Following [39], we consider a state preparation task formulated through the optimization of a global Hamiltonian.In particular, we are interested in minimizing H = I − |0⟩ ⟨0| supposing that all the qubits are initialized in state |0⟩.By applying the unitary ansatz U (θ) = n j=1 e −iθj σx/2 , we can easily detect the occurrence of a barren plateau (see Fig. 1 a).In fact, the cost function takes the form In addition, the derivative values are unbiased, i.e. ∂Cu ∂θj = 0, implying an exponential suppression of the probability to sample a non-negligible gradient as a consequence of Chebyshev's inequality.
We now emphasize that, as shown in [37], general noisy non-unitary interactions are not able to mitigate barren plateaus and can actually induce them [37].For example, if we model the noise effect with a depolarizing channel of probability p, the resulting cost function will be This nonunitary model, in addition to clearly having the same trainability problems of the cost function C u , also precludes the possibility of finding the correct ground state energy.
We now introduce an engineered dissipation through a non-unitary layer, ensuring that each L q operator acts dissipatively on a single qubit as per Eq. ( 2).We assume that, in general, the L q structure lacks a Hamiltonian part and instead is determined by a single jump operator L q = |0⟩ q ⟨1| with a corresponding damping rate normalized to 1.This situation is advantageous because the stationary state of the whole Liouvillian is the ground state of the sought-after problem.The cost function in the presence of such an engineered dissipation is now for which the unbiased condition on the gradient still holds.However, in this case, it is possible to efficiently avoid the barren plateau (see broader and still deep minimum in Fig. 1  c).In fact, Var[ ∂C ed ∂θj ] = 1 8 e −∆t (1 + 3 8 e −2∆t − e −∆t ) n−1 and if ∆t ∼ O(log(n)) the desired polynomial scaling is achieved.
We observe that the C ed landscape takes a similar shape as the one in [39], where the unitary ansatz was the same as we have considered here, but the global Hamiltonian was replaced by an equivalent local one.In the following, we will show how this landscape similarity is not a coincidence because the considered non-unitary ansatz allows, in general, localizing a problem that has been originally formulated as a global one.Moreover, we will also prove that the logarithmic ∆t scaling is a general feature of the procedure.

IV. THEORETICAL RESULTS
Now we will show how a non-unitary layer that respects the constraints introduced in Sec.III is able to make the Hamiltonian of the cost function local.First, we note that the diagonalizability of the L q superoperators is a mild condition that can be assumed to hold.In fact, if L q has a holomorphic dependence on the parameters σ q and if it is diagonalizable for a subspace of them, then it will be diagonalizable in general, apart from the possible appearance of countable exceptional points, which can be ignored in the discussion [68].Anyway, in all the cases discussed in the following, diagonalizability will always be ensured.Consequently, it is useful to expand the L q superoperators through their dual basis.To keep the notation simple, we will assume that each L q acts non-trivially on exactly K qubits [69].Introducing the Liouville notation for a generic matrix, ρ → |ρ⟩⟩, and considering the dot product ⟨⟨τ |ρ⟩⟩ = Tr τ † ρ , then e Lq(σq)∆t = 4 K −1 i=0 e λq,i∆t |r q,i ⟩⟩⟨⟨l q,i | ⟨⟨l q,i |r q,i ⟩⟩ , where {λ q,i } i is the set of the eigenvalues of L q and {|r q,i ⟩⟩} i and {|l q,i ⟩⟩} i are the corresponding set of orthogonal and normalized right and left eigenvectors [70].
Ordering the indexes as a function of the real parts of the eigenvalues such that Re{λ q,0 } > Re{λ q,1 } ≥ • • • ≥ Re λ q,4 K −1 , and assuming the uniqueness of the steady state |ρ ss,q ⟩⟩ (this is also a mild condition to satisfy), we identify λ 0 = 0, |r q,0 ⟩⟩ = |ρ ss,q ⟩⟩ and |l q,0 ⟩⟩ = |I⟩⟩.Equation 5 can be used to calculate the order of magnitude of the mixing times ∆t mix,q .Indeed, excluding the occurrence of singular phenomena such as the skin effect [71], we have ∆t mix,q ∼ O(1/| Re{λ q,1 }|), where the quantity | Re{λ q,1 }| represents the spectral gap.Our constraint of a common mixing time for all L q is, then, satisfied if the spectral gaps are equal and do not vary with σ q .Then, we deal with a unique mixing time, which we will refer to as ∆t mix .
For a value of the time such that ∆t log(Q) • ∆t mix an approximation of the non-unitary superoperator, in which only the slowest decay terms are taken, is justified (Theorem 6 in [72]): |ρ ss,j ⟩⟩⟨⟨I| ⟨⟨I|ρ ss,j ⟩⟩ Moving to the Heisenberg picture, the variational problem, defined by the cost function of Eq. ( 4), can be formulated as a unitary optimization on an effective Hamiltonian H ′ such that: Writing H as a linear combination of the L left eigenvectors (which form a complete set) |l q,iq ⟩⟩ and assuming that higher order terms in Eq. ( 6) can be disregarded, we arrive at the following approximate expression for H ′ : where I q refers to the identity operator in all qubit Hilbert spaces except the q-th subspace.At this point, we observe that H ′ takes the local form of Eq. ( 1) and is Hermitian, because the time evolution of H, according to (7), can always be written as a sum of time-decaying Hermitian matrices [73].Therefore, neglecting terms in the expansion does not preclude the hermiticity of the operator.Consequently, all the results concerning the training of the hyperparameter vector θ for local Hamiltonians can be applied.We emphasize that the condition imposed on the mixing times guarantees that each L q operator contributes equally to the expansion of the slowest terms of Eq. 6, resulting in the definition of H ′ .It is also easy to verify that the emergence of an effective Hamiltonian with local character holds even when faster modes are included.
Our strategy is effective only if the c ′ coefficients are not negligible.This property is achieved when the slowest decay right eigenvectors of L have a significant overlap with H and, consequently, the choice of the Liouvillan has to be designed according to the known Hamiltonian properties.
In addition to avoiding the problem of the barren plateau, we have to make sure that the minimum energy of H ′ and the minimum energy of H are close enough.Such a closeness will depend, in general, on the value of the parameters σ, and, consequently, the trainability of the non-unitary layer is a property to take into account.
To this end, we study how the derivative of the cost function varies as a function of a generic component of σ, which we call σ k .As in the previous case of the unitary parameters, a key quantity to calculate is the variance of σ k , which is related to the probability of sampling a non-negligible value of such parameter.If we call dµ(θ) and dµ(σ) the distribution volume elements of the free parameters, we obtain where H is a local Hamiltonian that can be written in the form of Eq. ( 8) and C is its relative cost function computed with respect to the unitary ansatz considered.
From Eq. ( 9) we learn that an exponential suppression of Var[∂C/∂σ k ]), according to [36], can occur if and only if the unitary ansatz presents a barren plateau in the optimization of H. Since H and H ′ have the same local character, considering a quantum circuit that is not affected by barren plateaus in the case of local Hamiltonians (see Sec. II) directly excludes the presence of barren plateaus in the non-unitary layer.
Moreover, we want to remark that the value of ∆t that ensures the validity of the approximation in Eq. ( 6) scales efficiently with the number of generators that compose the model, that is, O(log(Q)), independent on the specific Hamiltonian H of the problem.Furthermore, the condition Q = O(poly(n)) ensures an efficient logarithmic scaling even in the number of qubits, as required.
Let us also emphasize that the condition of Eq. ( 6) can be alleviated by including faster decay terms while still preserving a local structure for H ′ .This means that for practical purposes we can choose a value of ∆t significantly smaller than the theoretical threshold discussed above.
Finally, our analysis extends seamlessly to considering a convex combination of non-unitary maps generated by a Liouvillian that satisfies the constraints introduced in Sec.III.Indeed, a non-unitary layer given by the convex combination E = j β j e Lj ∆t , in addition to being a suitable quantum channel, will still transform a global Hamiltonian to a local one.Importantly, if we compute the variance of the derivative of the cost function with respect to either the β j coefficients or the L j free parameters, we find that the result reduces to the form shown in Eq. 9.This observation generalizes the conclusions drawn from our previous analysis.

V. RANDOM HAMILTONIAN EXAMPLE
We will now provide a numerical demonstration of the scaling calculated analytically in Sec.IV in a synthetic example.We consider a random Hamiltonian whose ground state in each realization can be either in the neighborhood of the state |0⟩ or of |1⟩.Assuming lack of knowledge of the ground state, the initial guess for the ansatz is random.For the sake of definiteness, we have fixed the minimum energy to the value of -1.1 (see Appendix A for more details).Moreover, as in Refs.[29,34], we will employ a hardware-efficient, layered quantum circuit as the unitary component of our ansatz (its form is shown in Appendix B).Considering the known properties of the Hamiltonian, we define the non-unitary layer as follows: where σ is a real free parameter, s(σ) is the sigmoid function that ensures the sum is a convex combination, and L (1) and L (2) are Liouvillians made up of singlequbit dissipators such that their corresponding stationary states are |0⟩ ⟨0| and |1⟩ ⟨1|, respectively.These two Liouvillians can be easily derived considering the superoperators family identified in Appendix C in such a way that they will respect the needed constraints presented in Sec.III.This particular choice ensures a non-negligible overlap with the ground state of the target Hamiltonian.Moreover, by training the parameter σ, we can transform the outputs of the quantum circuit into states close to the desired ground state.This convex combination can be implemented by considering a stochastic Liouvillian, sampling L (1) with probability s(σ) and L (2) with complementary probability.
In our numerical experiment, we first consider computing the derivative variance scaling with respect to the unitary ansatz parameters, with and without the addition of the non-unitary layer.In the case of the nonunitary ansatz, we have spanned ∆t from 0.1 to 3 with a resolution of 0.1 to find its optimal values.In Fig. 2(a) we can clearly observe that, for a fixed number of layers of the quantum circuit, dissipation enables the prevention of an exponential scaling of the gradient variance, which indicates the absence of the barren plateau, as we predicted.We also remark that the optimal values of ∆t found are always below the characteristic time threshold indicated in Sec.IV.To give an idea of the role of the dissipation strength, we plot the trend of the variance as a function of ∆t in Fig. 2(b).We find a non-monotonic trend that is the result of the competition between the constructive role of dissipation, leading to an increase of locality, but also the drawbacks of a contractive dynamics.The latter at long time would lead to a full Hamiltonian erasure (large ∆t in Fig. 2 2 (b)), while in the initial transient can rescale the Hamiltonian with a consequent reduction of the variance (more visible for 5-qubits).For the non-unitary layer to provide an advantage, the number of qubits has to be large enough to obtain a maximum that overcomes the initial value, which corresponds to the fully unitary case.In Fig. 2(a), we also show the behavior of the derivative of the variance with respect to the free parameter of the unitary layer as the number of qubits is changed.We can clearly exclude the presence of an exponential scaling, in agreement with the theoretical results of Sec.IV.
Finally, it is also important to test the accuracy of the ground-state energy estimation for both the unitary and the non-unitary ansatz.To this end, we applied 1000 iterations of the gradient descent algorithm, with a learning rate of 0.1, to 10 random initial conditions in the ideal case where the gradient can be exactly estimated.The results, presented in Fig. 2(c), show that the non-unitary circuit allows a faster convergence, with a relative average error of 2.2%, while the fully unitary circuit has a slower convergence but a higher accuracy of 1.5%.This is expected since the fully unitary circuit preserves the original Hamiltonian and hence its ground state.
To take advantage of both properties, we propose a hy-brid approach where the non-unitary ansatz is applied to the initial state for the first 500 iterations, and then only the fully unitary circuit is used for the last 500 iterations by setting ∆t to zero.This method serves as a novel initialization strategy, and from Fig. 2 (d), we observe a significant improvement in convergence time compared to the fully unitary case.Additionally, the average accuracy is the best found with a value of 1.2%.

VI. QUANTUM CHEMISTRY EXAMPLE
Moving to a more practical application, we show the results of the beneficial effects of our non-unitary model in the context of quantum chemistry.In particular, we have studied the problem of finding the electronic ground state of the H 2 molecule.In our numerical experiments, we studied the molecule in its equilibrium configuration where the two hydrogen atoms are separated by 0.74 Å.The qubit Hamiltonian H was obtained from a Jordan-Wigner transformation [74], while the electronic orbitals were generated from the STO-3G basis set [75].Under these conditions, H spans a four-qubit Hilbert space.Following the previous example, the quantum circuit that defines the unitary part of the ansatz is the one presented in Appendix B with a number of layers fixed to 20.As for the non-unitary layer, the nature of the problem suggests choosing a Liouvillian whose corresponding stationary state is the Hartree-Fock state [76].Interestingly, the class of Liouvillians presented in Appendix C can always accomplish this task by simply dissipating along the z direction, because the Hartree-Fock state is generally a computational basis state.As in the previous example of Sec.V, we have applied the gradient descent algorithm to ten random initial conditions both in the case of a fully unitary ansatz, which uniquely consists of a quantum circuit, and in the case of a non-unitary one.We found that a value of ∆t = 0.5 is sufficient to significantly improve the convergence time and the ground state quality as shown in Fig. 3.In particular, we have found that in the non-unitary case, the algorithm is able to converge by fixing a learning rate of 1, while in the fully unitary case, we need to set the learning rate to 0.1 to achieve faster convergence time and a lower value of the final ground state.In all the considered examples, we have fixed the number of iterations to 300.As in the random Hamiltonian example, we observe that the introduction of the non-unitary layer makes it possible to significantly reduce the algorithm time of almost two orders of magnitude, at the cost of convergence to a less accurate ground state.
As done in the previous section, we propose a hybrid approach in which, we first apply the non-unitary ansatz, and after half of the total iteration steps, we remove the dissipation and apply only the unitary ansatz to complete the experiment.As shown in Fig. 3, this strategy is successful both in reducing the convergence time and in improving the quality of the final ground state.We have set an error threshold with respect to the ground state energy calculated using a numerical diagonalization equal to 0.00159 Hartree, in line with the standard threshold for defining chemical accuracy.Notably, only the hybrid approach can estimate the ground state energy within this energy interval.

VII. DISCUSSION
Extensive research in the field of variational quantum algorithms is devoted to exploring strategies and techniques aimed at circumventing or minimizing the occurrence of barren plateaus.This is essential for improving scalability, algorithmic efficiency, and applicability in a wide range of contexts.Here we have proposed and demonstrated the effectiveness of a strategy based on engineered dissipation, that can be understood as a localization of a global cost function argument.This dissipation-based analysis goes beyond the general scaling laws recently presented in Refs.[41,42], which are limited to the use of a unitary ansatz.Indeed, the presented mitigation strategy has a time complexity that scales logarithmically as the number of qubits is increased, which makes it possible to estimate the ground state energy of systems that would be intractable with previous methods.
To assess the effectiveness of our approach, we have first examined the disappearance of the barren plateau resulting from the presence of engineered dissipation in a simple illustrative model that admits an analytical solution.This toy model also serves as a useful point of comparison for the scenario with general noise, which worsens the problem of attenuation of the barren plateau.Interestingly, handling Hamiltonians such as H = I − |ψ⟩ ⟨ψ|, where |ψ⟩ represents a generic pure state, can be challenging with alternative proposals, such as perturbative gadgets [77].While both these approaches map a global Hamiltonian into a local one, our engineered dissipation strategy does not require to increase the Hilbert space dimension and the time complexity is entirely unrelated to the decomposition of the Hamiltonian using Pauli matrices.
The formal theoretical framework of this proposal stands on the analytical proofs in Sec.IV, where we first demonstrate that the presence of a tailored non-unitary layer makes it possible to map the cost function into an effective one that corresponds to a local Hamiltonian.Moreover, we also prove that this type of non-unitary layer can be efficiently trained.This theoretical proposal opens a new avenue for implementations in noisy quantum processors.Actually, the form of engineered dissipation here presented can be efficiently implemented on experimental platforms, for instance through collision models [78], as discussed in Appendix D.
We then tested the use of nonunitary ansatzes in two relevant numerical examples.The first one addressed a random synthetic Hamiltonian, while the second tackled the realistic task of determining the lower energy of a Hydrogen molecule.Both problems display the effectiveness of our method in speeding up the convergence and also the precision reached in the estimation of the ground state energy.Indeed, the non-unitary approach can also be seen as an efficient initialization protocol for a unitary ansatz suggesting a powerful hybrid strategy where a unitary ansatz follows a non-unitary one.
Our results are consistent with other quantum machine learning protocols, where the presence of losses can enhance performance.In a recent work, a dissipative optimization algorithm, able to efficiently find Hamiltonian local minima, has been reported [79].Another relevant example can be found in the realm of quantum reservoir computing [80], where dissipation can be transformed into a constructive resource [81][82][83].To stay within the context of VQAs, it was demonstrated that incorporating stochastic noise can prevent the occurrence of saddle points, which are detrimental to efficient optimization [84].Finally, we proposed a simple dissipation model that proved very effective in mitigating barren plateaus.This indicates the potential for devising more comprehensive strategies that fully explore the potential of non-unitary architectures in variational quantum circuits but also in the broader arena of quantum neural networks.value which is fixed at 1.1, and the minimum eigenvalue which is set at -1.1.Additionally, we have constrained the eigenvectors associated with these last two values to be one of two states (up to a normalization factor): |ψ 1 ⟩ = |0⟩ + 0.1 |ϕ 1 ⟩ and |ψ 2 ⟩ = |1⟩ + 0.1 |ϕ 2 ⟩, selected randomly and exclusively, where |ϕ 1 ⟩ and |ϕ 2 ⟩ are Haardistributed states.We stress that for |ψ 1 ⟩ and |ψ 2 ⟩ to be, in general, viable eigenvectors, they must undergo a Gram-Schmidt orthonormalization.
Finally, we underline that the algorithm of Sec.V is built from two possible ground-state neighborhood guessing which are referred to two opposite cases: the maximum and the minimum eigenvalue.Consequently, the numerical results obtained indicate that the algorithm was able to successfully discriminate them.

Appendix B: Quantum circuit
We now show in detail the form of the quantum circuit considered in Sections V and VI.Following the refs.[29,34], it has the structure of a hardware-efficient, layered quantum circuit structured as follows: This comprises D layers of rotations and entangling gates, with the entangler W consisting of a series of controlled-Z gates that correlate adjacent qubits: A U l gate, on the other hand, is formed by single-qubit rotations in random directions: where R d i l (θ i l ) is a rotation gate that applies an angle of θ i l around the direction d i l to the i-th qubit, and d i l is randomly chosen from the x, y, or z axis, while θ i l takes values in the set [0, 2π].To avoid any preferential direction, the qubits are initially prepared in the state The all circuit structure is summarized in fig. 4.

Appendix C: Liouvillians based on single-qubit dissipators
We now present a simple class of Liouvillians that satisfy the constraints required for our purposes (see Sec. III) from which we have built the non-unitary layer considered in sections V and VI.We consider a generator L that takes the form of Eq. ( 3), where each operator L q implements single-qubit dissipation along a desired direction.More precisely, their action is described by the following relation: where d q is a jump operator acting onto the q-th qubit Hilbert space: Here the parameters α q and ϕ q identify the direction of the dissipation.This choice satisfies the required in Sec.III because the spectral gap of each L q is identically equal to 1/2.Moreover, each L q has as a unique stationary state |ψ − (α q , ϕ q )⟩ q ⟨ψ − (α q , ϕ q )|, and this information can be used to study the overlap with the target Hamiltonian to ensure the effectiveness of the approach.
Appendix D: Non-unitary layer implementation Nonunitary operations can be effectively realised on digital quantum computing architectures in several ways.This task is in fact closely related to the general problem of implementing quantum simulation algorithms for open systems [85][86][87][88][89].The most direct approaches employ, for instance, dilation theorems allowing one to recast dissipative dynamics into a unitary evolution on an extended system [90][91][92].Alternatively, or in combination with the latter, one could in principle make use of engineered dissipations leveraging intrinsic qubit noise [59,[93][94][95], randomized schemes [96,97] or controlled classical environments [98].
To give a concrete example, we now present an efficient implementation of the nonunitary layer by means of the so-called collision models (CMs) [99].The general idea of CMs is to approximate the Markovian dynamics given by Eq. ( 2) by letting the system qubits sequentially interact in a unitary way with a set of ancillary qubits.In particular, the following approximation is considered:  where Tr a indicates the partial trace over the ancillary qubits space, ρ a is their state which has to be properly prepared at each step and V (∆t/M ) is a unitary operator which, depending on the particular dynamics of interest, non-trivially acts on the system and ancillary qubits space.
Importantly, for local Liouvillians as required by the criteria of Sec.III, the number of resources needed by the collision-model strategy always scales efficiently [92].More quantitatively, let us call ϵ the difference between the exact Liouvillan dynamics and the M -step collision model: where ∥•∥ 1→1 indicates the trace norm.Then, it can be shown that the number of gates necessary to have an error ϵ is always a polynomial function of the number of qubits n.
According to Ref. [78], the class of models defined in Appendix Sec.C can be efficiently implemented by coupling each qubit with an ancillary one initialized in the state |0⟩.Going more into details, in the definition of ϕ ∆t/M , according to Eq. D1, we will have ρ a = |0⟩ ⟨0| ⊗n and V (∆t/M ) = n q=1 V q (∆t/M ) with V q (∆t/M ) = exp −i ∆t/M (d q σ + q (a) + h.c.) where the index q (a) refers to the q-th ancillary qubit.In view of a possible experimental implementation, we emphasize as a positive note that each unitary V q (∆t/M ) can be easily realized through a single-qubit gate and a CNOT gate and that the qubit reset operation, currently available on IBM Quantum devices [100], allows the number of ancillary qubits to be fixed to n for the entire execution of the collision-model algorithm.

FIG. 1 .
FIG. 1. Cross sections of the warm-up example cost function landscapes of Sec.III A for a 20-qubit system.(a) Fully unitary ansatz Cu, (b) noisy landscape Cn with depolarizing probability p = 0.5, and (c) engineered-dissipation ansatz C ed in the case of ∆t ≃ 2, 33, which corresponds to the maximum of Var[∂C ed /∂θj] (see main text).

FIG. 2 .
FIG. 2. Results of the random Hamiltonian example.(a) Scaling of the partial derivative variances with respect to θ 1 1 and σ.Each point was determined from a statistics of 1000 different random samples.It includes different rotations directions, Hamiltonian and free parameter realizations.Moreover, the x parameter was selected at random in the range [−5, 5] while all the angles were also selected at random in the compact set [0, 2π].For the non-unitary ansatz, we varied ∆t from 0.1 to 3 in steps of 0.1, selecting the biggest variance for each value of n.The variances with respect to θ 1 1 are depicted by dashed lines for the fully unitary ansatz whereas, for the non-unitary ansatz case, they are represented by continuous lines.The partial derivative variance with respect to x is depicted using the black dotted lines in the upper part of the plot.We only plot the cases where the non-unitary ansatz is beneficial for training (for very short chains no improvement is observed compared to the unitary ansatz ).(b) Variance averages with respect to θ 1 1 as a function of ∆t in the case of a 5-layer quantum circuit, for 5 and 8 qubits.(c)-(d) Evolution of the cost function evolution under the gradient descent algorithm with a learning rate of 0.1 and for 10 random initial conditions.The light lines represent single realizations, while the relative averages are reported in a darker color.(c) The performances of the two different cases (unitary vs non-unitary ansatz) are compared.(d) Cost function calculated through a hybrid approach: the first 500 iterations involve non-unitary ansatzes, while in the remaining 500 we impose the condition ∆t = 0.

FIG. 3 .
FIG. 3. Results of the gradient descent algorithm for the example of H2.(a) First 150 iterations of the algorithm.The blue line refers to the non-unitary ansatz optimization with ∆t = 0.5 and a learning rate equal to 1, the red dotted one to the unitary-ansatz optimized with a learning rate equal to 1, while, for the red continuous one, the unitary ansatz takes a learning rate equal to 0.1.The vertical black line indicates the ground-state energy computed by a numerical diagonalization of the Hamiltonian.(b) Last 150 iterations.The green line shows the hybrid ansatz optimization and the grey region refers to energy values falling within the considered error threshold.The hybrid ansatz learning rate is set to 0.1.

e
L∆t ≈ (ϕ ∆t/M ) M which consists of the alternating M steps where a quantum channel ϕ ∆t/M approximates the evolution for a finite time equal to ∆t/M .By definition,ϕ ∆t/M [ρ] = Tr a {V (∆t/M )ρ ⊗ ρ a V † (∆t/M )} (D1) is the n-qubit identity operator, H i is a generic Hermitian operator and c i is a real coefficient.
44.Quantum circuit employed in the numerical simulations of Sections V and VI. FIG