Inhomogeneous driving in quantum annealers can result in orders-of-magnitude improvements in performance

Quantum annealers are special-purpose quantum computers that primarily target solving Ising optimization problems. Theoretical work has predicted that the probability of a quantum annealer ending in a ground state can be dramatically improved if the spin driving terms, which play a crucial role in the functioning of a quantum annealer, have different strengths for different spins; that is, they are inhomogeneous. In this paper we describe a time-shift-based protocol for inhomogeneous driving and demonstrate, using an experimental quantum annealer, the performance of our protocol on a range of hard Ising problems that have been well-studied in the literature. Compared to the homogeneous-driving case, we find that we are able to increase the probability of finding a ground state by up to $10^8 \times$ for some Weak-Strong-Cluster problem instances, and by up to $10^3 \times$ for more general spin-glass problem instances. In addition to being of practical interest as a heuristic speedup method, inhomogeneous driving may also serve as a useful tool for investigations into the physics of experimental quantum annealers.


I. INTRODUCTION
Quantum annealing is a form of quantum computation that is primarily targeted at solving Ising combinatorial optimization problems [1][2][3][4]. In recent years, there has been great interest in finding whether or not an experimental quantum annealer (QA) can deliver a speedup over the best classical heuristic optimization methods [5][6][7][8][9][10][11]. Considerable effort has been put into understanding both what classes of problems might be most amenable to speedup on current experimental systems, as well as the design of modifications to current quantum annealing systems and protocols that may result in improved performance [12][13][14][15][16][17]. The prospect of experimental quantum annealers delivering a speedup has resulted in a large volume of work exploring potential applications for future quantum annealers, ranging from particle physics [18], to statistics [19], to bioinformatics [20].
A quantum annealer operates [1] by starting with strong quantum-fluctuation terms, called driving terms, that are slowly brought to zero by the end of the computation. Simultaneously, spin-spin couplings and external-field terms, which encode the problem to be solved, are increased from zero between the beginning and the end of the computation. Ideally the QA will end in a ground state of the encoded optimization problem. In practice, the probability of a QA finding a ground state at the end of any particular annealing run is far less than 100% [5] -probabilities of 10 −7 , or even smaller, are routinely observed for many problems on current experimental systems. A longstanding goal of the quantum-annealing community is to discover principles and methods that result in the probability of finding a ground state being maximized. Theoretical work has predicted that dramatic improvements in the success probability can be achieved if the driving terms are applied with different strengths to different spins; that is, they are inhomogeneous [13,17,21]. Recently it has become possible to experimentally test a restricted form of inhomogeneous driving, in which one does not have arbitrary control over the driving terms, but one can delay or advance the driving schedule on a qubit-by-qubit basis [22,23]. There are an exponential (in number of qubits, N ) number of possible choices for advancing or delaying the qubits's driving schedules, which provides scope for the investigation of a wide variety of strategies for using so-called anneal offsets (AO) to improve the performance of quantum annealers. In this paper we describe one particular strategy, which is distinct from the strategies reported in Refs. [22,23], and we present results from experiments performed on the D-Wave 2000Q (DW2000Q) QA [24] hosted at NASA Ames in which we show how this AO strategy improves the performance of the annealer across a range of benchmark problems that have previously been studied in the quantum-annealing literature [5,7,9,25,26].
We now introduce more formally the concept of anneal offsets, and the strategy that we investigate in this paper. The canonical form of quantum annealing, which involves homogeneous driving terms, is described by the following time-dependent Hamiltonian for N qubits:

H(s) = A(s)H D + B(s)H P , H
where s ∈ [0, 1] is the normalized time parameter (s = 0 is the start of the computation, and s = 1 is the end), and superscripts (i) and (j) are qubit indices. The Hamiltonian H P is the so-called problem Hamiltonian, and its ground states encode the solutions to the classical energy-minimization problem of the Ising model, . . . , x N ). The time-dependent terms in Eq. 1 are the schedules: A(s) controls how the driving terms, given in H D , are turned off between time s = 0 and s = 1, and B(s) controls how the problem-Hamiltonian terms are turned on between time s = 0 and time s = 1. For this form of quantum annealing, the driving is homogeneous: the driving term for every qubit is turned off using the same schedule (A(s)).
The D-Wave 2000Q quantum annealer, when operating with homogeneous driving, has both schedules A(s) and B(s) controlled by a single time-dependent signal c(s). Inhomogeneous driving is made possible by allowing this signal c(s) to be qubit-dependent in a specific way [22,23]: for each qubit i, the signal defining its annealing schedules can be perturbed by an offset δ i , which results in its driving-term schedule and problem-Hamiltonian-terms schedule being modified. Formally there are now N signals c i (s), which are set to c i (s) := c(s) + δ i , and these result in qubit-independent schedules A i (s) := A(s, δ i ) and B i (s) := B(s, δ i ). Illustrations showing the relationships between the normalized annealing time s, the signals c i (s), and the schedules A(s, δ i ) and B(s, δ i ), are given in Figure S6. We can define a vector specifying the anneal offsets for each qubit, δ = (δ 1 , . . . , δ N ), and the homogeneous-driving quantum annealing Hamiltonian in Eq. (1) is modified to become the following inhomogeneous-driving Hamiltonian: The driving terms σ (i) x now have per-qubit schedules A i (s). Figure 1 shows the annealing schedules for a problem with two clusters of qubits, where in one case homogeneous driving is used (Fig. 1A), and in the other, inhomogeneous driving -where the schedules for the qubits in the second cluster are delayed -is used (Fig. 1B) How should the anneal offsets δ be chosen to increase (ideally maximally) the probability of successfully finding a ground state? In the Supplementary Materials we show that the problem of finding optimal offsets can be viewed as the problem of optimizing a non-linear, high-dimensional functional that itself depends on the solution of the encoded minimization problem. Thus a formal approach seems impractical. Instead we may resort to heuristic approaches. Two prior experimental studies have outlined two different approaches for different classes of problem instances. Andriyash et al. [22] report a strategy for problems that are embedded in a physical QA graph such that each logical variable in the original problem is represented by a chain of physical qubits, which occurs on the D-Wave QA when the problem to be solved is not a subgraph of the QA's Chimera physical hardware graph. Their strategy is to apply to each qubit a delay that increases monotonically with the length of the chain that the qubit forms part of. They tested their strategy on integer-factoring problems embedded in the Chimera graph, and reported improvements of up to ∼ 10 3 ×. Lanting et al. [23] present an iterative approach to choosing the anneal offsets based on discovering the floppiness of each qubit (which they relate to the classical notion of a floppy spin-a spin that does not change the system energy if it is flipped), and setting the offsets based on floppiness. They tested their strategy on a class of crafted instances of size up to N = 24 qubits; larger problem instances were not explored.
One prominent hypothesis for why quantum annealers may fail to find a ground state is the phenomenon of freezeout, in which certain qubits are thought to become frozen long before the end of the computation [25,27,28]. A hypothesized intuition for how anneal offsets can improve success probabilities is by delaying the reduction of the driving terms for qubits that are prone to early freeze-out. The aforementioned Ref. [22] applied this intuition in the context of problems with qubit chains, where qubits in long chains are claimed to freeze earlier than those in short chains. In this paper we experimentally explore an AO strategy that is based primarily on two hypotheses: early freeze-out of a qubit can be mitigated by delaying annealing schedules for that qubit, and qubits that are more strongly coupled to the rest of the system freeze out earlier than those that are only weakly coupled [29].
We now give a precise definition of the strategy that we experimentally tested. One key component of the strategy is that we quantify how strongly coupled a spin is to the rest of the system via an effective field. Let j 1 , . . . , j Ni denote an enumeration of the spins that are coupled to spin i, and let s j1 , . . . , s j N i ∈ {+1, −1} be some configuration of these spins. We denote the effective field on spin i, as a function of the value of the spins neighboring spin i, by F i (s j1 , . . . , s j N i ), where this is defined as We will primarily be interested specifically in the absolute value of this quantity, |F i (s j1 , . . . , s j N i ))|, and in particular, the average of this over all possible neighboring spin values: Next, we normalize these averages to obtain values in the interval [0, 1]: A function that delays qubits based on the magnitude of their average effective fields is where |δ| max > 0 is the maximum magnitude of offset that is applied on any qubit according to this method, and is a parameter that we are free to choose. One detail we need to consider is that different offset ranges are available for different qubits, and it can be the case that this function assigns an offset to a qubit that the hardware cannot physically realize. If δ max i is the maximum offset value that can be applied on qubit i allowed by the hardware, and δ min i is the minimum such offset value, we can ensure that δ i (r i ) ∈ [δ min i , δ max i ] by using the following prescription, which is the formal definition of the strategy that we explore in this paper: We note that the averages in Eq. (4) can be computed without difficulty for any graph that has low maximum degree, such as that of the DW2000Q (which has a maximum degree of 6, so only at most 2 6 spin configurations have to be enumerated per qubit to calculate |F i |). The computation defined in Eq. (4) is intractable when the underlying QA hardware graph has vertices with high degree. However, in the Supplementary Materials, we present a modified version of the heuristic that is tractable even for fully connected graphs. Since in this paper we only work with the DW2000Q, we can directly perform the computations specified by Eq. (4).
We experimentally tested the strategy in Eq. (6) on four different classes of problems by comparing the performance of DW2000Q using baseline settings (no anneal offsets) against the performance of the DW2000Q using anneal offsets with various choices of |δ| max . Success probabilities were estimated by taking the number of observed successes, and dividing by the number of annealing runs performed to observe this number of successes. From these observed success probabilities, p, the corresponding time-to-solution (TTS) with desired probability p d (which we chose to be 99%), was computed using the formula TTS := t ann log(1 − p d )/ log(1 − p), where t ann is the annealing time used in each run. We used the default value t ann = 20 µs for all problems classes except for the Alternating-Sectors-Chain problems, for which we used t ann = 5 µs to allow for direct comparison with results from a recent baseline experimental study [26]. Furthermore, all problem classes were defined on the Chimera graph (native to the DW2000Q), except for the Alternating Sectors Chain, which is a 1D chain that embeds directly into the native Chimera graph.

A. Uniform-Range-k-Disorder (URkD) problems
The Uniform-Range-k-Disorder (URkD) class of problems is defined ( Fig. 2A) on the Chimera graph as those for which h i = 0 for all i ∈ {1, . . . , N }, and each J ij is chosen at random, with uniform probability, from the 2k discrete values in the set U k := {−k, −k + 1, . . . , −1, 1, . . . , k − 1, k}. These problems have previously been studied in the context of quantum annealing in Refs. [5,6,30]. The generic nature of this problem class makes it a good candidate for getting a sense for how useful the heuristic strategy in Eq. 6 might be on optimization problems that arise in a variety of application areas. We generated random instances of problems in the URkD class, for various graph sizes N and coupling ranges k, and measured the success probability of finding a ground state for each instance, both with baseline DW2000Q operation, and when using anneal offsets as prescribed by the heuristic. Our experimental results, ∈ U k . The local fields hv are all set to 0 (i.e, hv = 0 for all v ∈ V ). (B) Instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL . (C) The corresponding times-to-solution, calculated directly from the success probabilities in (B); TTS AO denotes the time-to-solution when using anneal offsets, and TTS BL denotes the time-to-solution with the baseline schedule. The instance for which the maximum improvement was observed is emphasized by a grey, dashed circle. Instances in the white zone correspond to those for which the schedule with anneal offsets resulted in better performance, whereas those in the grey zone correspond to those for which the baseline schedule resulted in better performance. The color indicates the relative difficulty of the instance as measured by the performance with the baseline schedule. (D) Percentage of instances for which an improvement in the success probability when using anneal offsets was observed, versus the maximum magnitude of the applied offsets, |δ|max. summarized in Fig. 2, show that anneal offsets chosen using the heuristic typically improve performance as measured by several different metrics. Figure 2B shows a scatter plot of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for 200 randomly generated instances of problem size N = 400 and maximum magnitude of spin-spin coupling k = 8, using a maximum magnitude of applied offset |δ| max = 0.03. Figure 2C shows the corresponding times-to-solution, calculated directly from the success probabilities in Fig. 2B; TTS AO denotes the time-to-solution when using anneal offsets, and TTS BL denotes the time-to-solution with the baseline schedule. In Fig. 2B,C, we have colored the instances by the difficulty with which the DW2000Q solves them using baseline settings. In particular, we have divided the instances into four difficulty groups, which we formally defined using ranges of the percentile-rank calculated for each instance based on its baseline success probability p BL . The grouping of instances by difficulty occurs in subsequent panels too, and in subsequent figures, and has been performed the same way throughout. We note that the more difficult instances are both more likely to benefit from the use of anneal offsets, as well as more likely to benefit to a larger degree, relative to the easier instances in this problem class. One can see this more clearly in Fig. 2D,E, which show the percentage of instances for which using anneal offsets resulted in improved performance compared to baseline, and the median time-to-solution ratio (i.e., the median of TTS BL /TTS AO ), respectively, both as functions of |δ| max .
It is natural to ask what value of |δ| max results in the "best" performance. An answer to this question is not straightforward, and Fig. 2D,E give some insight into the trade-offs encountered through various choices of |δ| max . To start, determining the best choice of |δ| max depends on the performance metric used. Another fact that further complicates this question is that different instances will be affected differently for the same value of |δ| max . For this problem class, the primary trade-offs to be balanced are that smaller values of |δ| max will generally result in increased performance over a larger percentage of the instances, but smaller median time-to-solution speed-ups for the more difficult instances, relative to larger values of |δ| max . For a clearer picture of these trade-offs, see the scatter plots in (Fig. S1) Another question that arises naturally is how the performance of the heuristic depends on problem size. To that end, Fig. 2F shows the percentage of instances for which the use of anneal offsets resulted in improved performance for different problem sizes, N , with |δ| max fixed at a value chosen based on Fig. 2D such that performance averaged over all instances is improved (|δ| max = 0.03). While the change in this metric as a function of N for each difficulty group is different, the overall impression is that as N increases, the benefit obtained from using anneal offsets over baseline is either constant or increasing, depending on the difficulty group. Furthermore, across all problem sizes there is a tendency for performance to be improved on a larger percentage of the more difficult instances, relative to the easier instances. Fig. 2G shows the median time-to-solution ratio for various problem sizes, with |δ| max fixed at a value chosen based on Fig. 2E such that performance averaged over all instances is improved (|δ| max = 0.05). While the behavior is again dependent on which difficulty group is being considered, in general there appears to be a tendency for the median time-to-solution ratio to increase slightly with problem size for the hardest 50% of instances, whereas for the easiest 50% it appears to be constant. Figure 2H shows the maximum observed speed-up (i.e., the maximum of TTS BL /TTS AO ) for various problem sizes, for all values of |δ| max tested. In general, there appears to be a tendency for the maximum speed-up to increase with increasing N , up to N = 400, at which point the maximum speed-up plateaus. The largest speed-up observed across all instances was just over 2000× (N = 300, |δ| max = 0.1). We note that higher values of |δ| max tend to be more likely to result in the maximum speed-up, compared to the smaller values of |δ| max .
The results discussed thus far have been for instances with k = 8, where k is the maximum magnitude of the spin-spin coupling values. We also performed experiments for a range of different k values, and found the results to be broadly similar ( Fig. 2I), suggesting that the use of anneal offsets provides benefit generally for URkD instances, at least when the U k values are well within the precision limits of the QA (2 ≤ k ≤ 14 tested here).

4-Modal-Range-k-Disorder (4MRkD) problems
The 4-Modal-Range-k-Disorder (4MRkD) class of problems is defined (Fig. 3A) on the Chimera graph as those for which h i = 0 for all i ∈ {1, . . . , N }, and J ij is chosen at random, with uniform probability, from the four discrete values in the set M k := {−k, −1, 1, k}. As with the URkD class, we generated random instances of problems in the 4MRkD class for various problem sizes N and coupling ranges k, and measured the success probability of finding a ground state for each instance, with and without the use of anneal offsets. Our experimental results, summarized in Fig. 3, show that, again, anneal offsets chosen using the heuristic in Eq. (6) typically result in improved performance. Figure 3B shows a scatter plot of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for 200 randomly generated instances of problem size N = 392 and maximum magnitude of spin-spin coupling k = 8, using a maximum magnitude of applied offset |δ| max = 0.05. Figure 3C shows the corresponding times-to-solution, calculated directly from the success probabilities in Fig. 3B; TTS AO denotes the time-to-solution when using anneal offsets, and TTS BL denotes the time-to-solution with the baseline schedule. As with the URkD problem class, we note that it is again the case that the more difficult instances (for the baseline solver, as defined by p BL ) are both more likely to benefit from the use of anneal offsets, as well as more likely to benefit to a larger degree (relative to the easier instances in this problem class). One can see this more clearly in Fig. 3D,E, which show the percentage of instances for which using anneal offsets resulted in improved performance compared to baseline and the median time-to-solution ratio (i.e., the median of TTS BL /TTS AO ), respectively, both The fields hv are all set to 0 (i.e, hv = 0 for all v ∈ V ). (B) Instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL . (C) The corresponding times-to-solution, calculated directly from the success probabilities in (B); TTS AO denotes the time-to-solution when using anneal offsets, and TTS BL denotes the time-to-solution with the baseline schedule. The instance for which the maximum improvement was observed is emphasized by a grey, dashed circle. Instances in the white zone correspond to those for which the schedule with anneal offsets resulted in better performance, whereas those in the grey zone correspond to those for which the baseline schedule resulted in better performance. The color indicates the relative difficulty of the instance as measured by the performance with the baseline schedule. (D) Percentage of instances for which an improvement in the success probability when using anneal offsets was observed, versus the maximum magnitude of the applied offsets, |δ|max. as a function of |δ| max .
It is again difficult to pick a single value of |δ| max that results in the "best" performance for the 4MRkD problem class, since determining the best choice of |δ| max depends on the performance metric used to measure its optimality, and because different instances will be affected differently for the same value of |δ| max (Fig. 3D,E). The primary trade-offs to be balanced are again that smaller values of |δ| max will generally result in increased performance over a larger percentage of the instances compared to larger values of |δ| max ; however, smaller values of |δ| max will generally result in smaller median time-to-solution speed-ups compared to larger values of |δ| max . (Fig. S2).
We now discuss how the performance of the heuristic depends on problem size. Figure 3F shows the percentage of instances for which the heuristic improved performance for various problem sizes, N , with |δ| max fixed at a value chosen based on Fig. 3D such that performance averaged over all instances is improved (|δ| max = 0.03). While the metric as a function of N for each difficulty group is different, the overall impression is that as N increases, the benefit obtained from using anneal offsets over baseline is either constant or increasing. Furthermore, across all problem sizes there is a tendency for performance to be improved on a larger percentage of the more difficult instances, relative to the easier instances. Fig. 3G shows the median time-to-solution ratio for various problem sizes, with |δ| max fixed at a value chosen based on Fig. 3E such that performance averaged over all instances is improved (|δ| max = 0.05). While the behavior is again dependent on the difficulty group, in general there appears to be tendency for the median timeto-solution to increase slightly with problem size, with the trend slightly more pronounced than for the URkD problem class. Figure 3H shows the maximum observed speed-up (i.e., the maximum of TTS BL /TTS AO ) for various problem sizes, for all values of |δ| max tested. In general, the maximum speed-up appears to be roughly constant with problem size. The largest speed-up observed across all instances was just over 2000× (N = 200, |δ| max = 0.1). Furthermore, we note that there is a tendency for the higher values of |δ| max to be more likely to result in the maximum speed-up, compared to the smaller values of |δ| max .
The results discussed thus far have concerned the case when k = 8, where k is the maximum magnitude of the spin-spin coupling values; additionally, we have focused on the case where the edge weights J ij are drawn with uniform probability from the set M k . We now discuss the performance of the heuristic when these features of the problem class are modified. Still drawing the J ij uniformly from M k , the results appear to be generally consistent across a broad range of different k values (Fig. 3I). Figure 3J shows how the median of time-to-solution ratio changes when we change the probability distribution with which the J ij are drawn (keeping k = 8 fixed). For the easier 90% of instances, there appears to be a tendency for the median time-to-solution ratio to increase slightly with Pr(|J ij | = 1)/Pr(|J ij | = 8), meanwhile for the hardest 10% of instances the ratio is roughly constant.

Alternating-Sectors-Chain (ASC) problems
The Alternating Sectors Chain (ASC) class of problems has been studied in the context of quantum annealing and adiabatic quantum computation in Refs. [26,31]. An Alternating Sectors Chain is a 1D chain of N spins divided into equally sized sectors of length n; sectors alternate between having so-called 'heavy' ferromagnetic spin-spin couplings, W 1 , and 'light' ferromagnetic spin-spin couplings, W 2 , where |W 1 | > |W 2 | > 0. Formally, if the spins are indexed by {1, . . . , N }, then ASC problems are given by h i = 0 for all i ∈ {1, . . . , N }, and for all i ∈ {1, . . . , N − 1} J i,i+1 = W 1 if i/n is odd, and J i,i+1 = W 2 otherwise. Furthermore, there is a technical restriction that there be b + 1 heavy sectors and b light sectors, which results in a limitation on the possible combinations of N and n. Figure 4A shows an example of an ASC instance with N = 10, n = 3. Because all the couplings are ferromagnetic, the problem is trivial to solve: the two degenerate ground states are the fully aligned states, with all spins pointing either up or down. Nevertheless, it is known that this problem exhibits an exponentially small gap in the sector size n [31], which implies an exponential computation time in the AQC framework. Recently, however, Ref. [26] showed experimentally that the performance of a quantum annealer solving ASC problems differed substantially from what one would expect based purely on the scaling of the minimum gap.
We briefly paraphrase the intuitive argument of Ref. [26] for why quantum annealing fails to efficiently solve this problem. For N 1 and n 1, any given sector approximates a 1D ferromagnetic chain with all-equal couplings. Such a chain encounters a quantum phase transition separating the ordered phase from the disordered phase when A(s) = B(s)J i,i+1 [32] Therefore, the heavy sectors order independently before the light sectors during the anneal. Since the transverse field generates only local spin flips, quantum annealing is likely to get stuck in a local minimum (with domain walls at the boundaries between heavy and light sectors) unless the annealing time is scaled exponentially with n.
The intuitive argument above suggests the following remedy via the use of anneal offsets. Let s 1 be defined as the value such that A(s 1 ) = B(s 1 )W 1 , and let s 2 be the value such that A(s 2 ) = B(s 2 )W 2 . In other words, s 1 is the normalized time when the heavy sectors order, and s 2 is the normalized time when the light sectors order, under homogeneous driving. Let δ 1 be an offset such that A(s 2 , δ 1 ) = B(s 2 , δ 1 )W 1 . Applying this offset to the heavy sectors, one can make it so that both the light and heavy sectors order at the same time, s 2 . If it is the case that the sectors independently ordering at different times makes the problem more difficult to solve via quantum annealing, then we should see an increase in the success probability by applying this offset. It turns out that anneal offsets calculated according to the prescription above are very similar to the anneal offsets that the heuristic in Eq. (6). To further demonstrate the generality of the heuristic in Eq. (6), in this section we present results from the application of it to ASC problems, instead of the above problem-specific prescription. Our experimental results, summarized in Fig. 4 B to H, show that the application of anneal offsets chosen using the heuristic in Eq. (6) improves performance on ASC instances for nearly all of the problem class parameter space that was tested. Figure 4B shows the success probability, p success , as a function of |δ| max , for various problem sizes N (W 1 = −1.0, W 2 = −0.5, n = 4). A performance improvement is observed for a broad range of |δ| max values, with a peak at |δ| max ≈ 0.56, and the success probabilities dropping roughly symmetrically for smaller and greater values of |δ| max . Note that the value of |δ| max that maximizes the success probability does not change appreciably with N . This is because the times at which the light and heavy sectors order depend primarily on the parameters W 1 , W 2 , n, and approximately the same value of |δ| max should synchronize the dynamics of the two kinds of sectors if these three parameters are kept fixed. In Fig. 4C, we can see how the time-to-solution speed-up, TTS BL /TTS AO , scales with problem size both for a fixed value of |δ| max chosen based on Fig. 4B such that performance is improved for every N (|δ| max = 0.56), and how it scales when at each N we use the value of |δ| max that maximizes the success probability for that N . In both cases, the speed-up appears to scale exponentially with N . We note that |δ| max = 0.56 either maximizes or very nearly maximizes the success probability for every N tested. Figure 4D shows p success as a function of |δ| max for various values of the light coupling, W 2 (W 1 = −1.0, N = 150, n = 4). A performance improvement is observed for every value of W 2 for some |δ| max . While there is no clear relationship between the optimal choice of |δ| max and W 2 , in Fig. 4E we can see that one can find a fixed value of |δ| max such that performance is either maximized or nearly maximized (|δ| max = 0.056) for all values of W 2 . Naively, one might expect p BL to increase monotonically with W 2 /W 1 , for 0 < W 2 /W 1 ≤ 1. Intuitively, the smaller W 2 /W 1 , the more inhomogeneous the dynamics of the light and heavy sectors, suggesting the problem might be more difficult to solve under homogeneous driving. Indeed, this is what we see in Fig. 4D. Similarly, one might intuitively expect the maximum speed-up obtained with the anneal offsets heuristic to decrease monotonically with W 2 /W 1 , for 0 < W 2 /W 1 ≤ 1: the smaller W 2 /W 1 , the more inhomogeneous the dynamics of the light and heavy sectors, and therefore, potentially, the more room for there is for the use of anneal offsets to provide an improvement. While such monotonic behavior is indeed observed for 2/8 ≤ W 2 /W 1 ≤ 7/8, there is a single, stark exception to this intuition when W 2 /W 1 = 1/8. It is unclear what accounts for this exception. Figure 4F shows the success probability as a function of |δ| max for various values of the sector size, n (W 1 = −1.0, W 2 = −0.5, N = 200) (for clarity of presentation, Fig. 4F shows the data for only 4 of the 18 different sector sizes tested). A performance improvement is seen for some values of n, with the degree of improvement being very strongly correlated with the baseline success probability for the instance (Fig. 4 (G and H)). As was previously reported in Ref. [26], instead of the p BL dropping monotonically with n, instead we see in Fig. 4G that p BL achieves a minimum for some intermediate n * (n * = 5), and then rises again for sector sizes larger than n * . In Fig. 4H we can see that there is a trend for the more difficult instances to benefit to a larger degree from the use of anneal offsets.

Weak-Strong-Cluster (WSC) problems
The weak-strong cluster (WSC) class of problems was studied in the context of quantum annealing in Refs. [7,9,25]. The weak-strong cluster (WSC) problem class was designed so that multi-qubit tunneling strongly impacts the success probability. The building block of this problem is a pair of strongly connected spins, also referred to as a pair of clusters. One cluster is referred to as a strong cluster, and the other as a weak cluster; each cluster corresponds to a cell in the Chimera graph. Within a weak-strong cluster pair, all the couplings are set ferromagnetically (J ij = −1); all the local fields h i in the strong cluster are set to h strong = −1, and all the local fields in the weak cluster are set to h weak = −λh strong , for some 0 < λ < 1 (depicted graphically in Fig. 5A).
For λ < 0.5, the global minimum of a weak-strong cluster pair corresponds to the configuration in which all spins point in the direction of the strong local field. As explained in Ref. [7], early in the anneal, however, the localfield terms dominate, so each spin orients itself along the direction of its own local field. Later in the anneal, the coupling terms dominate, and the spins in the weak cluster must tunnel through an energy barrier to escape the local minimum into which they are lead during this initial phase of the anneal. This problem class is interesting because it was designed to benefit from a computational strength of quantum annealing (multi-spin cotunneling [7,25]), while simultaneously being difficult for classical simulated annealing to solve. This has made it a well-studied problem class for which a quantum speed-up might be obtained, although a speed-up against the best classical methods has not yet been achieved [9]. Our experimental results, summarized in Fig. 5, show that anneal offsets chosen using the heuristic typically improve performance of the DW2000Q on the WSC class as measured by several different metrics. Figure 5B shows a scatter plot of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for 80 randomly generated instances with problem size N = 966 spins and weak local field h weak = −0.44 (i.e., λ = 0.44; this value of λ is chosen because Refs. [7,9,25] focused on instances with this choice of λ), using |δ| max = 0.08. Figure 5C shows the corresponding times-to-solution, calculated directly from the success probabilities in Fig. 5B. Before we continue to discuss the results in more detail, we note that 10/80 instances were not solved with the baseline schedule after 10 7 runs. Furthermore, not all of these instances were solved with the anneal offsets heuristic; depending on the value of |δ| max used, different fractions of them were solved. For all values of |δ| max , the instances not solved using the heuristic were a strict subset of the instances not solved with the baseline schedule. Because we were unable to obtain concrete estimates of some of the The corresponding times-to-solution, calculated directly from the success probabilities in (B). The instance for which the maximum improvement was observed is emphasized by a grey, dashed circle. Instances in the white zone correspond to those for which the schedule with anneal offsets resulted in better performance, whereas those in the light grey zone correspond to those for which the baseline schedule resulted in better performance. The color indicates the relative difficulty of the instance as measured by the performance with the baseline schedule. Note that the instances in the darker shade of red were not solved after 10 7 runs neither using anneal offsets nor the baseline schedule. Instances in the lighter shade of red were solved using the anneal offsets heuristic, but not with the baseline schedule. (D) Percentage of instances for which an improvement in the success probability when using anneal offsets was observed, versus the maximum magnitude of the applied offsets, |δ|max. success probabilities (both p AO and p BL ), we are forced to present a somewhat tailored interpretation of the results for the corresponding instances.
With this in mind, we now continue on to discuss the results. A performance improvement is observed for nearly all instances in Fig. 5 (B and C) (with performance worsening on only 5/80 instances, all of which lie in the easiest 50% of instances). Figure 5D shows the percentage of instances for which using anneal offsets resulted in improved performance compared to baseline, versus |δ| max . We can see that, when |δ| max is chosen appropriately (e.g., |δ| max = 0.08), one can improve performance on a larger percentage of the more difficult instances, relative to the easier instances. Similarly, we can see in Fig. 5E, which shows the median times-to-solution speed-ups (i.e., the median of TTS BL /TTS AO ) versus |δ| max , that one can improve performance on the more difficult instances to a greater degree, relative to the easier instances, with appropriate choice of |δ| max (e.g., |δ| max = 0.08). Note that the values reported for the hardest 10/80 instances represent lower bounds for the median time-to-solution speed-up for the subset of the instances solved with the heuristic; the subset is in general different for different values of |δ| max .
We now discuss how the use of the anneal offsets heuristic affects performance as a function of problem size, N . Figure 5F shows the percentage of instances for which the use of the heuristic improved performance for various problem sizes, with |δ| max fixed at a value chosen based on Fig. 5D such that performance averaged over all instances is improved (|δ| max = 0.08). While the behavior is different for each difficulty group, over all instances the overall impression is that the benefit obtained from using anneal offsets increases up to some intermediate size (N = 507), beyond which the results remain roughly constant (with performance increasing on ≈ 90% of instances). Fig. 5G shows the median time-to-solution ratio for various problem sizes, with |δ| max fixed at a value chosen based on Fig. 5E such that performance averaged over all instances is improved (|δ| max = 0.08). While the behavior again depends on the difficulty group being considered, over all instances there is a trend for the benefit obtained from using anneal offsets to increase with problem size. Note that for the largest problem size tested (N = 966), the median speed-up reported for the hardest 12.5% of instances (≈ 160×) is in fact a lower bound for the actual median time-to-solution speed-up for a subset of the instances, namely, the subset of instances solved with the heuristic. Figure 5H shows the maximum observed speed-up (i.e., the maximum of TTS BL /TTS AO ) for various problem sizes. In general, there appears to be a tendency for the maximum observed speed-up to increase with problem size; while it remains nearly constant for intermediate problem sizes (between N = 507 by N = 699), a positive trend resumes when including data for the largest problem size tested (N = 966, for which a ≈ 20000× speed-up is observed).
We also study how the use of anneal offsets affects the QA performance on instances constructed with different choices of the parameter λ ∈ [0, 1], or, equivalently, the value of the weak local fields, h weak . Figure 5J shows the median of the observed success probabilities for 10 randomly generated instances for each of several values of h weak , and Fig. 5J shows the median and maximum time-to-solution speed-ups observed for these instances. There appears to be a range of h weak values for which there is a noticeable spike in the difficulty of the problem instances (approximately 8/15 < |h weak | < 9/15 for N = 72, and 8/15 < |h weak | < 10/15 for N = 507). Outside of this range, the instances in this problem class can solved with relatively high success probability when using the baseline schedule. Inside this range, instances with a corresponding p BL < 10 −7 were very common. In fact, for N = 507, none of the instances in this regime that we generated were solved after 10 7 anneal runs. In order to collect data for this parameter regime, we had to run experiments at the significantly reduced problem size of N = 72. Notably on one of the instances the use of anneal offsets improved the success probability by ≈ 10 7 ×. Consequently the time-to-solution was reduced from just over 9 × 10 8 µs (≈ 15 minutes) to just under 40 µs. In general, we found that the use of anneal offsets improved the performance on more difficult instances to a larger degree, relative to on easier instances.
We conclude this section by noting the following. A previous benchmark study of the D-Wave 2X quantum annealer (a 1000-qubit QA) on WSC problem instances reported [9] that the QA was very close to achieving a quantum advantage against a battery of state-of-the-art classical solvers across all tested problem sizes. A re-evaluation using the DW2000Q with anneal offsets is an interesting prospect for future study, especially since the WSC problems benchmarked in Ref. [9] were of the same subclass we show results for in Fig. 5 (B to H) (namely h weak = −0.44; h strong = 1; J ij = −1), and in this work we have found that substantial performance improvements can be achieved on this subclass.

III. DISCUSSION
The heuristic we have tested has a single parameter, |δ| max , which is freely chosen. We found that different instances benefit differently for the same value of |δ| max . In general, it appears that the more difficult instances of a problem class are more likely to benefit across a wider range of |δ| max values, as well as more likely to benefit to a larger degree, relative to the easier instances in that problem class. Of course, in practice, one does not in general know the difficulty of a particular instance a priori. It would be helpful to develop a method capable of predicting the optimal |δ| max value to use for a particular instance based on its h i , J ij values; this is an avenue for future work. For the present, our empirical results provide a guideline for choosing |δ| max : if one has no other information, begin with |δ| max = 0.05. This choice provides a good balance between improving the performance for difficult instances and not excessively decreasing the performance for easy instances; this was true across all four broad problem classes we explored. The scatter plots in Figures S1, S2, and S4 allow one to build intuition for the trade-offs obtained by various choices of |δ| max . Relatedly, in the Supplementary Materials, we present a hybrid strategy that mitigates the risk that a chosen |δ| max results in detrimental performance for a particular instance, and ensures that the strategy's TTS is at worst two times longer than the baseline TTS.
A quantitative understanding of how anneal offsets applied using the approach in this paper improves performance is currently lacking. The Alternating-Sectors-Chain problem instances are analytically tractable, which has made it possible to explore how anneal offsets change the minimum gap and the number of thermally accessible excited states, even for large problems sizes. In the Supplementary Materials we show that neither change explains the improvement in performance that we experimentally observed for this problem class. A more detailed analysis of the dynamics of the computation process might ultimately be necessary to develop a predictive model for the impact of anneal offsets; such a model could aid efforts to design optimal strategies for applying anneal offsets.
In conclusion, in this paper we demonstrated a heuristic strategy for tuning the anneal offsets on a DW2000Q quantum annealer that results in improved time-to-solution over baseline DW2000Q performance for a broad range of problems. For the most generic problem class we investigated, the Uniform-Range-k-Disorder problems, one can improve the performance for up to 74% of instances overall, and for 85% of the hardest 10% of instances, with speedups of up to 10 3 × observed. For more structured problem classes, like Weak-Strong-Cluster problems, one can improve the performance for up to 94% of instances overall, and 100% of the hardest 10% of instances, with speed-ups of up to 10 7 × observed. Furthermore, for certain parameter regimes of WSC and ASC problems we found that the speed-ups achieved by using anneal offsets increased exponentially as a function of problem size N , suggesting that speed-ups orders of magnitude larger than even 10 7 × may be achievable. We anticipate that the strategy we described, and derivatives thereof, will form a useful part of the toolbox in experimental quantum annealing. The speedups we obtain for broad classes of problems naturally suggest this is a technique of potential practical relevance when attempting to optimize the performance of a quantum annealer. Furthermore, its dramatic impact on the success probabilities of Alternating-Sectors-Chain problem instances, for which we have detailed analytical results, may make this heuristic strategy a useful control knob for studies attempting to elucidate the working mechanisms of experimental quantum annealers.
Note added: A key component of the anneal-offsets heuristic presented in this paper is that a qubit's average effective field is obtained by summing over all possible neighboring spin configurations, each configuration having equal weight. This procedure can naturally be generalized to a weighted average for the effective field. During preparation of this manuscript we became aware of related unpublished work along this line by D-Wave Systems Inc., wherein it is proposed that these weights be chosen according to the frequency with which they appear in the solutions returned by running the problem instance with the baseline schedule [33].

D-Wave run protocol
Each call to the DW2000Q is limited to 10 4 annealing runs. The success probabilities were estimated by executing up to 10 3 calls, i.e., 10 7 annealing runs, and stopping the calls after the instance was solved a minimum of n success = 5 times. For some of the more difficult problem instances, 10 7 runs were not enough to solve the problem 5 times, but in nearly all cases it was enough to solve the problem at least once; the instances that were unsolved after 10 7 runs are explicitly denoted in our results. A different gauge was used every 10 3 runs.

V. ACKNOWLEDGEMENTS
We would like to thank M. Amin, E. Hoskinson, T. Lanting, J. Dunn, and A. Mishra for useful discussions. We would also like to thank B. Fuller and J. Brahm for a thorough reading of a draft of this paper. Finally, we gratefully acknowledge USRA and NASA for providing us with D-Wave 2000Q machine time on the system installed at NASA Ames, and in particular Davide Venturelli. Figure S1 shows an instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for the U8RD problem class of problem size N = 400, for every value of |δ| max tested. We can see that there is a considerable amount of variance both in the baseline success probabilities, as well as the success probabilities when using anneal offsets. Furthermore, there is a significant amount of variance in the degree to which anneal offsets either improves or worsens the success probability, with the variance becoming increasingly pronounced with larger |δ| max . This high-variance nature of the data makes it difficult to summarize in just a few summary statistics. Fig. S1. Instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for the UR8D problem class for every value of |δ|max tested; problem size N = 400. In general, the variance in the ratio p BL /p AO increases with |δ|max. Figure S2 shows an instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for the 4MR8D problem class of problem size N = 392, for every value of |δ| max tested. We can see a very similar situation here as with the results in Fig. S1, discussed in the previous paragraph: the high-variance nature of the data makes it difficult to succinctly summarize.  Figure S3 shows additional results for the alternating sectors chain problem class of problem size N ≈ 175. In Fig. S3A we can see that the minimum gap at the critical point when using the anneal offsets heuristic, which we'll denote ∆ * AO , is larger than the baseline minimum gap at the critical point, which we'll denote ∆ * BL , for sector sizes n ≥ 5 (we use h = 1 units throughout). While both ∆ * AO and ∆ * BL decrease monotonically with n, ∆ * AO appears to decreases more slowly compared to ∆ * BL . One might naively expect from Fig. S3A that the largest speed-up observed when using anneal offsets would occur for n = 20, where ∆ * AO /∆ * BL is maximized, but we can see in Fig. S3B that this is not the case. Instead, the speed-up increases with sector size for 1 ≤ n ≤ 4, and then generally decreases for n > 4 (with very minor exceptions at n = 16,19). Indeed, there does not appear to exist any clear correlation between the ratio ∆ * AO /∆ * BL and the observed speed-ups. Note that the energy scale set by the operating temperature is much larger than both ∆ * AO and ∆ * BL for every sector size. In Fig. S3C we can see how the minimum gap when using anneal offsets, which we'll denote ∆ AO , differs from the baseline minimum gap, which we'll denote ∆ BL , as a function of normalized annealing time s ∈ [0, 1], when the sector size n = 4, which is the value of n at which the greatest time-to-solution speed-up is observed. Given the substantial difference in the times-to-solution, it is somewhat surprising that the only notable difference between ∆ AO and ∆ BL is the shift in the critical points. A similar shift is observed for all sector sizes (Fig. S3D), independent of the time-to-solution ratio observed for that value of the sector size.

Alternating-Sectors-Chain (ASC) problems
In Ref. [26], it is argued that a key quantity in predicting the success probability of the quantum annealer on this problem class is the number of single-fermion states that lie below the energy scale set by the temperature at the critical point, which we'll denote k * . In general, in Ref. [26] it is shown that the success probability decreases when k * increases. It is therefore interesting to check what effect, if any, anneal offsets has on this quantity; one could conjecture that the speed-ups observed when using anneal offsets are a consequence of anneal offsets decreasing k * . In Fig. S3E we indeed see that k * AO ≤ k * BL for all sector sizes tested; here k * AO is the number of single-fermion states that lie below the energy scale set by the temperature at the critical point using the anneal offsets heuristic, and k * BL is the same quantity but using the baseline schedule. Interestingly, however, that there is no clear correlation between observed speed-up and the ratio k * AO /k * BL . For example, the greatest speed-up is observed for sector size n = 4, but k * AO = k * BL for this sector size. It is interesting to note that the values of k * BL (Fig. S3E) differ considerably from those in [26]. This can be attributed in large part to the fact that the annealing schedules of the quantum annealer used in this study are different from the annealing schedules of the quantum annealer in [26]. In fact, by rescaling the energy (in simulation) of the annealing schedules used in this study to match the energy scales of [26], we get exact agreement (Fig. S3F) for nearly all sector sizes (with only a minimal disagreement for sector sizes n ∈ {2, 3, 4}). It is still the case at these different energy scales that there is no clear correlation between the time-to-solution ratio and the ratio k * AO /k * BL ; here k * AO is the number of single-fermion states that lie below the energy scale set by the temperature at the critical point using the anneal offsets heuristic on a DW2000Q with the energy rescaled to match the energy of the quantum annealer used in [26], and k * BL denotes the analogous quantity with homogeneous annealing schedules. For example, while the greatest speed-up is observed for sector size n = 4, k * AO > k * BL for this sector size.. Similarly, k * AO (n) < k * BL (n) for n > 13, even though the performance of the quantum annealer with the anneal offsets heuristic is in fact worse than with the baseline schedule for those sector sizes.  [26], and using the annealing schedules of the quantum annealer used in this study with the energy scales rescaled to match the energy scales used the the aforementioned D-Wave 2X. Figure S4 shows an instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for the WSC problem class of problem size N = 966 and weak local fields h weak = −0.44, for every value of |δ| max tested. Compared to both the UR8D problem instances and the 4MR8D disorder problem instances of Fig. S1 and Fig. S2 respectively, we can see that the WSC problem instances in S4 are more robust to anneal offsets: they benefit both to a larger degree and across a wider-range of |δ| max values. Note that 10/80 instances were not solved with the baseline schedule after 10 7 runs. Depending on the value of |δ| max , a different fraction of these instances were solved using anneal offsets, which allows one to derive at least a lower bound on the improvement when using anneal offsets. For all values of |δ| max , the instances that were not solved with the anneal offsets heuristic were a strict subset of the instances not solved with the baseline schedule (i.e., we did not observe any instance solved with the baseline schedule that was not also solved using the anneal offsets heuristic, independent of |δ| max ). For these particularly difficult instances, perhaps it would be interesting in future work to perform more runs to obtain more concrete estimates on the success probabilities. It is unclear how many runs this would require, or if it would be feasible. Fig. S4. Instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for the WSC problem class for every value of |δ|max tested; problem size N = 966. In general, the variance in the ratio p BL /p AO increases with |δ|max. Interestingly, in contrast to the UR8D (Fig. S1) and 4MR8D ( (Fig. S2) problem classes, the increased variance is primarily a consequence of instances benefiting to a larger degree, as opposed to a mix of some instances improving to a larger degree, and others being negatively impacted to a larger degree. Data points with numbers inscribed indicate overlapping data points. Figure S5 shows an instance-by-instance comparison of the observed success probability when using anneal offsets, p AO , versus the observed success probability with the baseline schedule, p BL , for all 8 instances of the WSC problem class of problem size N = 72, and weak local fields h weak = −8/15. For each instance, p AO indicates the observed success probability using the anneal offsets heuristic with the value of |δ| max which resulted in the highest observed success probability; the optimal value of |δ| max for each instance is shown. Despite the relatively small problem size, these instances turn out to be surprisingly difficult when using the baseline schedule, with half of the instances (namely, the instances in Fig. S5 A, B, F, and G) unsolved after at least 10 7 runs with the baseline schedule. In particular, the instance in Fig. S5A was not solved with the baseline schedule after 1.12 × 10 7 runs, implying an upper bound on the baseline success probability of p BL < 9 × 10 −8 . In contrast, using the anneal offsets heuristic used in this study we observed a success probability of up to p AO = 0.902 (with |δ| max = 0.1), from which we can deduce the lower bounds p AO /p BL ≥ 10 7 , which in turn implies a time-to-solution speed-up of on the order of 10 7 ×, with TTS AO /TTS BL ≥ 2 × 10 7 . Fig. S5. Optimal |δ|max for each of the WSC problem instances of problem size N = 72 used in this study, and the corresponding success probabilities and TTS speed-ups. Instances for which only an upper bound on p BL is reported indicate instances that were not solved with at least 10 7 runs with the baseline schedule. For these instances, we can only report a lower bound on the time-to-solution speed-up.