Distributionally Robust Variational Quantum Algorithms with Shifted Noise

Given their potential to demonstrate near-term quantum advantage, variational quantum algorithms (VQAs) have been extensively studied. Although numerous techniques have been developed for VQA parameter optimization, it remains a significant challenge. A practical issue is that quantum noise is highly unstable and thus it is likely to shift in real time. This presents a critical problem as an optimized VQA ansatz may not perform effectively under a different noise environment. For the first time, we explore how to optimize VQA parameters to be robust against unknown shifted noise. We model the noise level as a random variable with an unknown probability density function (PDF), and we assume that the PDF may shift within an uncertainty set. This assumption guides us to formulate a distributionally robust optimization problem, with the goal of finding parameters that maintain effectiveness under shifted noise. We utilize a distributionally robust Bayesian optimization solver for our proposed formulation. This provides numerical evidence in both the Quantum Approximate Optimization Algorithm (QAOA) and the Variational Quantum Eigensolver (VQE) with hardware-efficient ansatz, indicating that we can identify parameters that perform more robustly under shifted noise. We regard this work as the first step towards improving the reliability of VQAs influenced by shifted noise from the parameter optimization perspective.

Numerous efforts have been made to optimize VQA parameters [17][18][19][20].One critical challenge for VQA parameter optimization is quantum noise [21][22][23], which limits their capabilities and introduces additional complexities to parameter optimization.Modeling and mitigating hardware noise is a core part of Near-term Intermediatescale Quantum (NISQ) algorithms [24][25][26].Quantifying and improving the reliability and robustness of a VQA has been an important task and has gained increasing attention recently.To name a few, machine learning methods have been used to estimate the reliability of a quantum circuit [27]; noise-aware ansatz design methodologies [28] and robust circuit realization from a lower-level abstraction [29,30] have also been investigated.
A more challenging yet practical problem is the sensitivity of quantum noise to the real-time environment.Suppose we have an accurate model of the quantum noise as a reference.However, the quantum noise can change significantly under different environmental conditions in real time, making the reference noise model inaccurate.Some studies [31,32] have considered the reproducibility and stability under different noise models.We refer to this phenomenon of noise change as "noise shift".
In this paper, we ask a fundamental question: Can we optimize the VQA parameters such that they are robust to potentially shifted (unknown) noise?We assume that we have access to a fixed noise model, but the actual noise level is an unknown random variable with an unknown PDF.This fixed PDF represents our limited knowledge about the potential noise shift.
To optimize VQA parameters under such unknown noise, for the first time, we propose a new min-max optimization formulation.Such an optimization formulation is called distributionally robust optimization (DRO) in the classical operation research community [33][34][35][36].DRO is an advanced optimization framework that aims to find solutions resilient against a range of possible probability distributions rather than a single expected distribution.In our context, we aim to optimize parameters based on the worst-case distribution of noise levels.This task, while distinct, complements error mitigation efforts.Rather than attempting to reduce quantum noise, our focus is on optimizing parameters in the presence of potentially shifting noise.Various error mitigation techniques can be seamlessly integrated with our method.
Paper contributions.In this work, we investigate the problem formulation, numerical solver, and validation of variational quantum algorithm training under unknown shifted noise.The overview is illustrated in Fig. 1.

Our specific contributions include:
• To be robust against the real-time shifted noise, we formulate the problem of optimizing VQA algorithms as a distributionally robust optimization that aims to optimize a targeted performance under the worst-case noise distribution.
• To solve the distributionally robust optimization, we model the unknown PDFs as a distributional uncertainty set that is defined by Maximum Mean Discrepancy (MMD).We then solve the min-max Real-time noise level Overview of the distributionally robust variational quantum algorithms.Given an ideal ansatz and noise model, we assume the noise level is a random variable that can change in real time.We have samples of the noise level variable ξ from a reference distribution, ansatz parameter θ, and the corresponding VQA performance f (θ, ξ).With the real-time noise, the VQA landscape and its optimum θ can potentially change.Specifically, the optimal θ under a certain noise level may not perform well under another noise level.Likewise, an optimal θ under a reference noise level PDF may not perform well under another noise level PDF.To address the landscape shift, we reformulate the parameter optimization problem as a min-max formulation to find a robust parameter θ.In other words, we aim to optimize the performance under the worst-case noise level PDF.We use a distributionally robust Bayesian optimization solver to solve the new parameter optimization formulation, which is still handled by classical computers.
• We validate the proposed min-max formulation on two popular VQAs, including QAOA for the MaxCut problem and VQE with hardware-efficient ansatz for the one-dimensional Heisenberg model.The numerical results show that the proposed parameter optimization algorithm has better performance than conventional ones under the shifted noise.

II. METHOD
Variational Quantum Algorithms (VQAs) are a class of algorithms in quantum computing that utilize a hybrid approach, combining classical and quantum computing resources to solve computational problems.They are es-pecially pertinent for use with Noisy Intermediate-Scale Quantum (NISQ) devices, which are the currently available quantum hardware.
The core idea of VQAs is to define a parameterized quantum circuit (ansatz) that manipulates the state of a quantum system in a way that depends on a set of parameters θ.Given an interested observable O, these parameters are then optimized classically to minimize an objective function ⟨ψ(θ)|O|ψ(θ)⟩, which is evaluated by the quantum system.However, due to the hardware noise, the actual ansatz and the resulting quantum state |ψ(θ, ξ)⟩ differ from the ideal one.

A. Distributionally robust optimization formulation of VQAs
We assume we have access to a fixed noise model and have estimations on its noise level, which follows a certain PDF ξ ∼ ρ(ξ).Let f (θ, ξ) = ⟨ψ(θ, ξ)|O|ψ(θ, ξ)⟩ be the interested quantity evaluated from an ansatz parametrized by θ under a noise level ξ, then a stan-dard variational quantum optimization algorithm becomes stochastic programming: For simplicity, we consider the noise level for a single noise model as a scalar, but it can be seamlessly extended to a high-dimensional case.However, due to the real-time noise, the PDF of the noise level ξ can shift and become unknown.As a result, we assume that ρ(ξ) is not exactly known and it can be any PDF inside a set P, which makes it impossible to obtain a deterministic value of E ρ(ξ) [f (θ, ξ)].As a result, we try to optimize the worst-case value of When the uncertainty set degenerates to P = {ρ(ξ)}, problem (2) degenerates to the standard stochastic optimization probelm in (1).On the other hand, the problem degenerates to a robust optimization under the worst noise level when the PDF of a noise level degenerates to a Dirac function.
The distributionally robust circuit optimization (2) may be intractable in practice because: (a) P may contain an infinite number of PDFs describing process variations; (b) the min-max problem is hard to solve by nature; (c) we do not have an analytical form for f (θ, ξ) under the presence of noise.In this section, we properly define the PDF uncertainty set P and solve problem (2) leveraging distributionally robust Bayesian optimization (DRBO) [37,39,46] developed recently in the machine learning community.

B. Distribution Uncertainty Set
We model the PDF uncertainty set P as a ball whose center is the nominal distribution ρ 0 (ξ) and radius ε is measured by a distribution divergence D: There are many options for the divergence D, including Maximum Mean Discrepancy, Wasserstein distance, φdivergence, etc [33].Here, we choose the Maximum Mean Discrepancy (MMD).MMD aims to compare the means of samples drawn from two distributions in a high-dimensional reproducing kernel Hilbert space (RKHS) induced by a positive definite kernel function [47].For the tractability of the problem, we discretize the noise level in a finite space Ξ with n parts.Then, let H M be an RKHS with corresponding kernel k M : Ξ × Ξ → R, we can embed the distributions ρ 0 (similarly for ρ) into H M via the mean embedding: Then the MMD between two distributions ρ 0 and ρ over Ξ is defined as where be the density probability of two discrete distributions, if we replace the expectation with the empirical expectation, i.e., 4) can be written as: where ) is the kernel matrix.

C. DRO Main Workflow
By modeling the distribution uncertainty set defined via MMD, the DRO problem (2) becomes tractable.The main steps are summarized below.
• Step 2. Given a current θ, solve the inner problem to determine the worst-case PDF of ξ: where f θ := f (θ, •) ∈ R n is the output with a given parameter θ.
• Step 3. Solve the outer problem to update θ.
Specifically, Step 2 can be solved analytically via convex programming.
Step 3 can be solved via a numerical optimizer.However, one of the challenges in steps 2 and 3 is that we need to simulate multiple f (θ, ξ), which can be expensive in practice.To address the computational issue, we apply a Bayesian optimization solver to the workflow.The key idea is to sequentially learn a surrogate model of f (θ, ξ) and optimize it by iteratively adding informative samples.

2:
Construct a GP model as the probabilistic surrogate model f (θ, ξ) = GP(θ, ξ) based on St−1 Define the PDF of the worst-case distribution Sample K noise levels from the reference PDF ξ k ∼ ρ0 and simulate f (θt, ξ k ), for k = 1, 2, . . ., K 7: We first construct a probabilistic surrogate model f (θ, ξ), which can estimate both the output and its uncertainty given an input (θ, ξ).Here, we use the Gaussian process regression model GP(θ, ξ) as the surrogate f (θ, ξ).Then we use its lower confidence bound (LCB) to replace the original objective function f (θ, ξ) in Eqs. ( 6) and ( 7) in steps 2-4 (8) where µ(•) and σ(•) denote the estimated mean and standard deviation, and β is a parameter to balance the model exploitation and exploration.
a. Gaussian process surrogate To build the Gaussian process regression model, we need to predefine the mean function m(•) and the kernel function and their simulation outputs y = {f (x i ) + ϵ} M i=1 , the GP model assumes that the simulation outputs follows a Gaussian distribution [48]: where µ ∈ R M is the mean vector with The measurement noise is characterized as a white noise ϵ in the simulation output.
Then, the GP model can offer a probabilistic prediction of a new data x ′ , GP(x ′ ) ∼ N (µ(x ′ ), σ 2 (x ′ )), as follows: In our cases, we choose the prior mean as m(x) = 0 and . Note that such a kernel k GP in GPR is different from the kernel function k M embedded in MMD.Beyond the Gaussian process, other surrogate models are also possible.
b. Optimal selection Regarding the selection of an optimal solution, it terms out to be non-trivial.Since we want to estimate the expectation over the noise distribution, it will be too expensive to estimate with real quantum devices.Instead, we choose the solution with maximized model posterior, i.e., we choose the θ = argmax θ E ρ(ξ) [µ( f (θ, ξ))] = ⟨w, µ( fθ )⟩, where fθ is the mean prediction from f with a given parameter θ.It is a common strategy for similar conditions [49,50].
In addition, an accurate estimation over f (θ, •) will benefit our model output.For this motivation, at the end of each iteration in Alg. 1, we can add a batch of samples of ξ from Ξ in step 3.It can help build a better probabilistic model and fasten the BO solver convergence.
c. Remarks One possible further improvement is to treat the BO solver as a warm-start procedure.After returning a few high-quality solutions from BO, we can conduct the local numerical optimization by taking them as the initial.The local search step may introduce additional computational cost and need more calling of f (θ, ξ) instead of the surrogate model, but it can lead to a potentially better solution.The hybrid of different solvers is also a common strategy in VQA parameter optimization [43,51].
The proposed distributionally robust optimization can easily degenerate into stochastic optimization or robust optimization.Stochastic optimization, namely Eq. (1), does not consider the real-time change of the noise.Robust optimization degenerates the PDF of the noise level ρ(ξ) to a single scalar.This case can easily lead to overconservative parameter optimization.
The current distribution uncertainty set modeling of Maximum Mean Discrepancy has the great power of capturing the worst-case distribution.However, we need to discretize the noise level PDF in order to efficiently estimate the MMD.We are aware of the f -divergence modeling [39] of the distributional shift that does not distritize the noise level, but it performs poorly in our experiments due to the unfitted modeling of shifted noise level distribution.

III. RESULTS
We validate the distributionally robust formulation of optimizing VQA parameters (2) in two widely used VQA applications: one is using QAOA for MaxCut and the , where the noise level PDF is the reference one at x = 0.The y axis is the expectation of the approximate ratio of the QAOA solution evaluated at different noise PDF.We report the average result over 10 non-isomorphic graphs.As we can see, the standard BO-LCB solution has the best performance under the reference noise.However, under an increasingly shifted noise, the DRBO solution begins to outperform the BO-LCB.Meanwhile, the BO-Stable solution is over-conservative with respect to the noise.It significantly scarifies the performance under the reference PDF and the slightly shifted PDFs to gain an improvement under significant shifts.These observations are consistent in the experiments with different QAOA depths.
other one is using VQE for one-dimensional Heisenberg model.Here we conduct the numerical experiments on a simulator in order to adjust the noise level easily and correspondingly to validate the method.To apply the distributional robustness formulation in hardware experiments, the estimation of the noise model and noise level is another challenge, which is out of the scope of this work.
a. Baseline We compared the proposed DRBO solver to a standard Bayesian optimization for solving stochastic optimization, BO-LCB [52], and a robust Bayesian optimization, BO-Stable [53].In BO-LCB, we target problem 1 with a fixed reference distribution of noise level ρ 0 (ξ) using a Bayesian optimization approach.We use the same GP surrogate model and its LCB as Eq. ( 8), but without solving the outer problem (7).Specifically, the lines 4 and 5 of Alg. 1 are combined as solving min θ ⟨w, LCB(θ, ξ)⟩.
In BO-Stable, we target a shift-aware problem but only focus on the worst noise level instead of the worst distributional noise level.We use the same GP surrogate model and its LCB as Eq. ( 8).Differing from using DRBO for problem (2), the lines 4 and 5 of Alg. 1 are replaced with solving min θ LCB(θ, ξ ⋆ ) where given a θ, the worst ξ ⋆ is defined as ξ ⋆ := argmax ξ LCB(θ, ξ).
To access different VQA parameter optimization methods, we obtain different parameter solutions and evaluate them under different noise level distributions ρ(ξ).

A. Experiments on QAOA
QAOA is a leading variational quantum algorithm for combinatorial optimization problems.It alterna-tively applies two operators, a phase-separation operator and a mixer operator, to drive a quantum system to the target solution state.
Without hardware noise, a QAOA solution is denoted as We will take the MaxCut problem as a case study of QAOA.Given a graph G = (V, E) with vertices V and edges E, the MaxCut problem aims to find a cut that partitions the graph vertices into two sets with the largest number of edges.Its cost function is written as where s i and s j are binary variables associated to the vertices in V , which assume value 1 or −1 depending on which of the two partitions defined by the cut are assigned.Its cost Hamiltonian is defined as , where Z i denotes a Pauli-Z operator.
In applying QAOA for solving the MaxCut problem, given a noise model with noise level ξ, we aim to optimize the QAOA parameters θ = (γ, β) such that the resulting solution |ψ(θ, ξ)⟩ has minimal energy f (θ, ξ) = ⟨ψ(θ, ξ)|H C |ψ(θ, ξ)⟩.Considering the uncertainty and real-time shift of noise level, we aim to find parameters θ that make QAOA performance more robust towards the real-time noise by solving the DRO problem (2).
Here, we discretize the noise level into 20 bins in [0, 0.08] evenly.We assume the reference noise follows a truncated Gaussian distribution, with the real-time shift may shift its mean to a larger value.To begin with, for a depth-p QAOA ansatz, we initialize the sampling set i=1 by taking M = 20p, where θ i is drawn from the design space based on a Latin hypercube approach [54], and the noise level samples ξ i are

BO-LCB BO-Stable DRBO
FIG. 4. The results for solving the ground energy of a 6spin, J = 1, B = 0.2 one-dimensional Heisenberg model via VQE with two-layer hardware efficient ansatz.The x axis denotes the significance of noise shift, where the noise level PDF is the reference one at x = 0. We first obtained the optimal parameter θ0 in a noiseless simulation, which solves the problem perfectly with f (θ0) = −4.8.Then we report the relative improvement of the energy . Under all the shifted distributions, BO-LCB performs close to θ0.DRBO scarifies limited performance under mild noise and performs much better than the BO-LCB and noiseless optimal θ0 in significantly shifted noise.While the BO-Stable can also find the robust parameter under the shifted noise, it performs not as well as DRBO, especially when the noise shift is mild.
drawn from the reference distribution ρ 0 (ξ).We set the maximum BO iterations as T = 20p.
As shown in Fig. 2, we evaluate different BO-based parameter optimization results on 10 graphs with degree-3 and graph size N = 14.We report the average approximate ratio results under different shifted noise levels.The x-axis denotes the index of the levels of noise shift, with a higher one denoting a more significant shift, and index-0 denotes the reference noise.Since we solve the optimal θ under shifted noise, the DRBO-solved QAOA is expected to perform worse than the one solved from a standard BO solver under the reference noise.However, as the noise shift becomes more and more significant, the DRBO solution begins to show its advantages.Notably, for the BO-Stable, it performs better than BO-LCB under a significantly shifted noise as well.However, it is also over-conservative under the reference noise since it only considers a single worst noise level.The results and observations are consistent over different QAOA depths.
We plot the solution during the BO iterations in Fig. 3.During the iteration, the performance is evaluated under the optimal solution selected from the maximum posterior rather than from the solution of an acquisition function.We show the performance evaluated under the reference noise and the shifted noise.During the iterations, DRBO consistently converges to a shifted-noise preferred solution while the LCB converges to a reference-noise preferred one.We also show the PDFs of the reference noise, shifted noise, and the worst-case estimated from the DRBO algorithm.We can see that the MMD approach successfully captures the shifted noise under the worse-case distribution, enabling the DRBO to explore the parameters space that performs better under shifted noise.

B. Experiments on VQE
Variational quantum eigensolver (VQE) is another popular VQA, specifically designed to simulate quantum systems and find the ground state energy of quantum systems.We use the VQE algorithm with a hardwareefficient ansatz [55] for simulating the ground energy of a one-dimensional Heisenberg model defined as: where J is the strength of the spin-spin interaction and B is the magnetic field along the Z direction.Here, we use a hardware-efficient ansatz to implement the VQE algorithm (see more details in the supplementary).Given an ansatz parameterized by θ and under a noise model with noise level ξ, we denote the state as |ψ(θ, ξ)⟩.The VQE algorithm cost function is defined as f (θ, ξ) = ⟨ψ(θ, ξ)|H|ψ(θ, ξ)⟩.Here, we aim to find the ansatz parameters that lead to robust performance under shifted noise by solving the DRO problem (2).
Here, for a layer-2 hardware-efficient ansatz, we discretize the noise level into 20 bins in [0, 0.08] evenly.We assume the reference noise follows a truncated Gaussian distribution, and the real-time PDF may shift its mean to a larger value.
In Fig. 4, we show the ground energy solved for a 6spin system with J = 1, B = 0.2, whose ground state is highly entangled.Aiming to optimize the last layer parameters, we initialize the sampling set with M = 40 and set the maximum BO iteration T = 40 as well.We reported the results from a two-layer hardware-efficiency ansatz with a more detailed setup and results in Supplementary.Similar to the QAOA results, DRBO performs better than BO-LCB under a significantly shifted noise.Additionally, the DRBO solution almost does not scarify the reference noise performance.In the meanwhile, Bo-Stable solves an over-conservative result.

A. Landscape shift
Ref. [56] has shown that the optimal variational parameters are unaffected by a broad class of noise models, such as measurement noise, gate noise, and Pauli channel noise.This phenomenon is called optimal parameter resilience.Meanwhile, some noise can shift the location of minima.A rich of work has studied how quantum noise can influence the VQA landscape [57][58][59].We highlight that the shift location of optimal parameters motivates our work, i.e., given that the optimal parameter will change under different (shifted) noise, we aim to find a parameter with robust performance under the shifted noise environment.
In our simulation, we use the phase and amplitude damping noise model, which has been shown to change the values of optimal parameters.The landscape with a changed or unchanged optimum is illustrated in Fig. 5.

B. Variant problem formulation
a. Radius varying formulation Beyond optimization under real-time noise, another case where the distributionally robust optimization can be applied is as follows: we do not have a precise enough estimation toward the real noise level distribution as the reference distribution.As we collect more data on the noise level, we can have a more accurate estimation of its PDF.Therefore, as the iteration continues, we can gradually refine the center ρ 0 (ξ) of the uncertainty ball and reduce its radius ε.
b. Gate error modeling Beyond modeling hardware noise, another possible modeling is on the gate error f (θ + ξ) in the parameterized quantum circuit realization, which assumes that the gate parameters are not ex-actly implemented but suffer from some errors.In such a formulation, under different error levels of ξ, the optimal θ will have different values.The DRO formulation can optimize the VQA to find parameters that are robust to the shifted gate errors.A robustness analysis of such a formulation is discussed in [60].

C. Conclusion and opportunities
Noise has been a major obstacle to the applications of near-term quantum computers, especially the variational quantum algorithms (VQA).Various error mitigation techniques have been intensively studied.However, one of the challenges of noise modeling and error mitigation is that the quantum noise is usually changing as time evolves.Consequently, a VQA with optimized parameters may perform poorly in another noise environment.
In this paper, we have presented a distributionally robust optimization formulation to optimize VQA parameters that are more robust under shifted quantum noise.The proposed formulation is efficiently handled by a distributionally robust Bayesian optimization (DRBO) solver.We validate the proposed method in two popular VQA benchmarks, QAOA for MaxCut and VQE with hardware-efficient ansatz for the one-dimensional Heisenberg model.The proposed distributionally robust optimization formulation does not mitigate the inherent quantum noise, but it fights against the noise at the algorithm level.It can be potentially integrated with various error mitigation techniques to improve the VQA robustness further.
Our formulation can be more impactful in large-size problems.When scaling to large-size problems or a deeper VQA ansatz, a smaller noise level may influence performance.For example, a larger problem needs a deeper circuit depth to implement QAOA.In this case, a small shift in the noise level can propagate and accumulate and potentially impact the VQA performance.Therefore, the VQA parameter optimization under a shift can be more important.
To integrate the proposed distributionally robust formulation into more practical use cases, a better knowledge of the noise models is highly desired since we currently only model the variations of noise levels.To improve the distributionally robust optimization solver, some better techniques that do not need to discretize the noise level or efficiently handle high-dimensional parameter optimization can be developed.
We also have applied a similar distributionally robust optimization formulation for classical circuit optimization [61], where we identified the shifts of process variations.In this paper, we focus on handling the parameter optimization of noisy variation quantum algorithms and highlight the challenge of real-time noise.cedure to generate the PDF of shifted noise by shifting the mean of the initial truncated Gaussian distribution.One example of the newly sampled θ with p = 1 is plotted in Fig. 6.The DRBO algorithm explores the parameter space toward the optimal one under a shifted noise, while the other algorithms exploit the space surrounding the optimal parameter under the reference noise.Therefore, the DRBO could find a parameter that performs better under shifted noise.
More results on MaxCut with graph size N = 8, 10, 12 and QAOA depth p = 1, 2, 3 are shown in Fig. 7.The results are consistent with the ones in Fig. 2. The DRBO solution performs better than the baselines under significantly shifted noise, which demonstrates that our method could optimize the VQA parameters that are more robust to the real-time shifting noise.

VQE
The hardware-efficiency ansatz is set up as shown in Fig. 8.The number of parameters grows quickly and becomes challenging for a BO solver.For simplicity, we only optimize the last N parameters and fix the others, similar to the idea of layer-wise optimization in Ref. [62].For the demonstration purpose, the fixed parameters are obtained through a multi-start classical optimization routine.We follow the same procedure as QAOA to set up the noise level distribution of both the reference and the shifted ones.

k=1 8 :
end for 9: Return optimal θ ⋆ D. BO solver for DRO problem Next, we explain how to solve DRO via Bayesian optimization with a few quantum circuit simulations.Bayesian optimization sequentially builds a probabilistic surrogate model of f (θ, ξ) and explores the design space by minimizing an acquisition function.The overall DRBO algorithm is summarized in Algorithm 1.

FIG. 2 .
FIG.2.The results for solving N = 14 3-regular graph MaxCut problems via QAOA.The x axis denotes the significance of noise shift, where the noise level PDF is the reference one at x = 0.The y axis is the expectation of the approximate ratio of the QAOA solution evaluated at different noise PDF.We report the average result over 10 non-isomorphic graphs.As we can see, the standard BO-LCB solution has the best performance under the reference noise.However, under an increasingly shifted noise, the DRBO solution begins to outperform the BO-LCB.Meanwhile, the BO-Stable solution is over-conservative with respect to the noise.It significantly scarifies the performance under the reference PDF and the slightly shifted PDFs to gain an improvement under significant shifts.These observations are consistent in the experiments with different QAOA depths.

FIG. 3 .
FIG.3.One example of evolving of solution in different BO algorithms.The x axis is the iterations in a BO algorithm, and the y axis is the expectation of cost function evaluated over noise level at a θ.The evaluated θ at one iteration is obtained by maximizing the model posterior, which is unnecessary to be the explored θ at that iteration.Under the reference noise PDF, the BO-LCB algorithm converges to a better solution, while the DRBO converges to a better solution under the shifted noise.The rightmost figure shows the example PDFs of the reference noise level, shifted noise level, and the estimated worst-case noise level from the DRBO algorithm.

5 FIG. 5 .
FIG.5.Example of energy landscapes under different noise models.The left heatmap is the depth-1 QAOA landscape for the MaxCut problem.The color denotes the solved energy.The optimal point in this landscape is highlighted as a triangle.The middle heatmap is the landscape under simple Pauli errors, which has been shown not to change the VQA optimal and uniformly flatten the landscape.The right heatmap is the landscape under the phase and amplitude damping noise, where the optimum is shifted and the energy landscape has a different shape.Under the noise, both the middle and the right ones have worse energies than the left noise-free landscape.

FIG. 6 .
FIG. 6.An example of explored θ in p 1 noisy QAOA cost landscape in a N = 8 MaxCut problem.The optimum θ differs from the noiseless optimum.Compared to the BO-LCB and BO-Stable, the DRBO explores the parameter space that performs well under the shifted noise.

FIG. 7 .FIG. 8 .
FIG. 7.More results on the Max-Cut experiments with graph sizes N = 8, 10, 12 and QAOA depth p = 1, 2, 3.While potentially sacrificing the performance under the reference noise a little, the DRBO solution performs better under the significantly shifted noise.Meanwhile, BO-Stable solutions are over-conservative.