Zero-temperature quantum annealing bottlenecks in the spin-glass phase

A promising approach to solving hard binary optimization problems is quantum adiabatic annealing in a transverse magnetic field. An instantaneous ground state—initially a symmetric superposition of all possible assignments of N qubits—is closely tracked as it becomes more and more localized near the global minimum of the classical energy. Regions where the energy gap to excited states is small (for instance at the phase transition) are the algorithm's bottlenecks. Here I show how for large problems the complexity becomes dominated by O(log N) bottlenecks inside the spin-glass phase, where the gap scales as a stretched exponential. For smaller N, only the gap at the critical point is relevant, where it scales polynomially, as long as the phase transition is second order. This phenomenon is demonstrated rigorously for the two-pattern Gaussian Hopfield model. Qualitative comparison with the Sherrington-Kirkpatrick model leads to similar conclusions.

Here the first two terms, diagonal in z-basis, encode the objective function. The last term represents the magnetic field in the transverse direction, which is decreased from G(0)c1 to G(T ann ) ¼ 0. The time T ann needed by the algorithm is determined by a condition that the annealing rate is sufficiently low to inhibit non-adiabatic transitions: These are most likely near points where the instantaneous gap to excited states DE attains a minimum as a function of G; further, DG is defined as the width of the region where the gap remains comparable to its minimum value. QA offers no worst-case guarantees on time complexity 7 , but initial assessments of typical case complexity were optimistic. Both experimental 8 and theoretical 9 evidence hinted at performance improvement over simulated annealing for finitedimensional glasses; however, some empirical evidence in support of the theory has recently been called into question 10 . Early exact diagonalization studies also observed polynomially small gaps in the constraint satisfaction problem (CSP) on random hypergraphs 11 , but that finding had been challenged by quantum Monte Carlo studies involving larger sizes 12 . Benchmarking of a hardware implementation of QA, courtesy of D-Wave Systems, shows no improvement in the scaling of the performance 13,14 . Whether that might be attributable to a finite temperature at which the device operates or its intrinsic noise is unclear at present [15][16][17] .
Statistical physics offers some intuitive guidance: Small gaps develop at the quantum phase transition point and become exponentially small when the transition is first-order [18][19][20][21] or when different parts of the system become critical at different times for strong-disorder continuous phase transitions 22 . The most promising candidates for QA are thus problems with bona fide second-order phase transitions, where the disorder is irrelevant at the quantum critical point (QCP).
The scaling analysis described here suggests a polynomially small gap at the critical point of the archetypal spin glass: the Sherrington-Kirkpatrick (SK) model [23][24][25] . It has been pointed out 9,26 that QA may still be doomed by the bottlenecks in the spin-glass phase. Exponentially small gaps away from the critical point have been observed in simulations 27 , but adequate theoretical description of this phenomenon has proven challenging. A perturbative argument in support of this qualitative picture has been considered in ref. 26. However, the results were not borne out by more accurate analysis that took into account the extreme value statistics of energy levels 28 .
The present manuscript sheds light on the mechanism of tunnelling bottlenecks in the spin-glass phase. Using exact, nonperturbative, methods, this is illustrated for a simple model, but the main findings are expected to be valid for quantum annealing of more realistic spin glasses. During annealing, the system must undergo a cascade of tunnellings at some specific values of G 1 ,G 2 ,y in an approximate geometric progression. For a finite system size, these bottlenecks are few, O (log N), and may not even appear until N is sufficiently large, highlighting the challenge of interpreting the results of numerical studies. Bottlenecks also become increasingly easier as G-0, counter to expectations that tunnellings are inhibited as the model becomes more classical. A related finding is that the time complexity of QA is exponential only in some fractional power of problem size: a mild improvement over more pessimistic estimates 26 .

Results
Summary. The spin-glass phase, which is entered below some critical value of the transverse field G c , is characterized by a large number of valleys. Often, this transition is abrupt, driven by competition between an extended state and a valley (localized state) with the lowest energy, as occurs in the random energy model 19,20 . The exponentially small overlap between the two states then determines the gap at the phase transition. However, even if new valleys develop in a continuous manner as G decreases, small changes in the transverse field may result in a chaotic reordering of associated energy levels, leading to Landau-Zener avoided crossings and attendant exponentially small gaps.
Nonetheless, attempts to make this intuition exact are fraught with potential pitfalls. For increasing G, two randomly chosen valleys are equally likely to come either closer together or further apart in energy. In the case of the former-and further, if the sensitivity of energy levels to changes in the transverse field is so large that the levels 'collide' before either valley disappearsavoided crossing will occur. This may not be necessarily the case when one considers 'collisions' with the ground state, which are of particular concern to QA. The ground state corresponds to a valley with the lowest energy; this and other low-lying valleys obey fundamentally different statistics of the extremes.
A case in point is the analysis of ref. 26, which develops perturbation theory in G. The classical limit (G ¼ 0) is used as a starting point; how that analysis might be extended to G40 has also been discussed 29 . A type of CSP has been considered: classical energy levels are discrete non-negative integers (number of violated constraints) but have exponential degeneracy. 'Zeeman splitting' for G40 scales as ffiffiffiffi N p , which, for large problems, may be sufficient to overcome the O(1) classical gap and cause avoided crossings of levels associated with different classical energies. Yet this trend disappears if only levels with the smallest energies (after splitting) are considered; these are relevant for avoided crossings with the ground state. This about-face is not immediately apparent, only coming into play for N\100, when the exponential degeneracy of the classical ground state sets in. It has, however, been confirmed with analytical argument and numerics 28 . Consequently, arguments based on perturbation theory cannot be used to establish the phenomenon.
This manuscript offers a fresh perspective, illustrated by studying quantum annealing of the Hopfield model. Mean-field analysis correctly describes thermodynamics if the number of random 'patterns' is small. The method is further extended to extract information about exact quantum energy levels. Importantly, the classical energy landscape is made much more complex by insisting that the distribution of disorder is Gaussian. Figure 1 sketches a 'phase diagram' obtained for this model. For decreasing G the gap changes as follows: (1) it is finite (does not scale with N) in the paramagnetic phase, G4G c ; (2) scales as 1/N 1/3 in the narrow region of width 1/N 2/3 around G ¼ G c ; (3) increases slightly, with typical values scaling as 1/N 1/4 for GoG c . In addition, there are avoided crossings at isolated points G 1 ,G 2 ,y.
The first non-trivial example requires two Gaussian patterns. In this case the 'energy landscape' is effectively one-dimensional, which greatly simplifies the analysis. The most important element of this analysis is accounting for the extreme value statistics associated with the valleys (local minima) having the lowest energies. To this end, the distribution of energies of the classical landscape must be conditioned so that they are never below the energy of the global minimum. This becomes feasible when reformulated as a continuous random process, in the limit N-N.
This particular model is not as interesting from the computer science perspective, not least because it affords an efficient classical algorithm. It is sufficiently simple so that a complete quantitative analysis presented later on in the manuscript has been possible. Yet, the model captures the essential properties of the spin glass: its qualitative features directly apply to much more general models, including Sherrington-Kirkpatrick. The most important feature of the classical energy landscape is that it exhibits fractal properties, which both ensures that hard bottlenecks are present in the spin-glass phase and also governs their distribution. The role of the transverse field is to approximately coarse-grain it on scales determined by G, eliminating small barriers; thus the number of valleys decreases as G grows. A specific random process, corresponding to the energy landscape of the 'infinite'-size instance, will contain every possible realization of itself at some 'length scale'. Some realizations will contain high barriers that cannot be easily overcome; these will lead to tunnelling bottlenecks.
This intuition can be used to immediately establish the scaling of the number of tunnelling bottlenecks. Since the model contains no inherent length scale in the limit N-N, it can be argued that the expected number of tunnelling bottlenecks in a finite interval [G 1 ;G 2 ] should be a function of the ratio G 2 /G 1 . The logarithm is the only function that respects additivity, that is, To obtain the total number of bottlenecks, one considers the interval [G min ;G c ]. Here G c B1 is the boundary of the spin-glass phase. The lower cutoff, G min , corresponds to the lowest energy scale of the classical model, which scales as an inverse power of system size, for example, as 1/N for the Gaussian Hopfield model. In a sense, tunnelling bottlenecks are connected to the G ¼ 0 'fixed point' (note that the classical gap vanishes asymptotically). To summarize, the number of tunnelling bottlenecks will grow as N tunn % a ln N: Locations (in G) will depend on specific disorder realization, but self-similarity ensures that the successive ratios G n /G n þ 1 converge to a universal distribution. This logarithmic rise is far weaker than a power law seen in some phenomenological models of temperature chaos 30 and, as has been argued above, likely to be a universal feature. The prefactor is model-dependent; its numerical value can be used to estimate the minimum problem size for which the mechanism becomes relevant, via N tunn % 1. A value of aE0.15 is obtained for the problem at hand, so that additional bottlenecks become an issue for N\1,000. Prior numerical studies similarly required large sizes before the exponentially small minimum gap was observed 27 , and so far there has been no evidence of two or more exponentially small gaps coexisting. The slow, logarithmic increase of the expected number of bottlenecks is the most likely culprit.
A notable feature of these results is that tunnellings become progressively 'easier' as G-0 despite the fact that the model becomes more classical. Tunnelling gaps increase as Notice that they cease to be exponentially small for GtG min ; at that point the ground state is already localized near the correct global minimum. The power law exponent for this stretched exponential is model-dependent, related to the scaling of barrier heights. These scale as N 1/2 for the Gaussian Hopfield model, which, together with O(N) scaling of the effective mass, gives rise to the N 3/4 term in the exponent. The finding that the gaps increase for smaller G deserves explanation. Typically, valleys with similar energies differ by up to N/2 spin flips. This changes once lowest energies are considered: All spin configurations with energies less than E above the global minimum are contained in a neighbourhood of radius O½ðNEÞ 2=3 , using Hamming distance as a metric. The problem is not rendered easy by the mere fact that the global minimum is so pronounced (although theoretical analysis inspired an efficient classical algorithm for the p ¼ 2 Hopfield network, briefly described later on). It does imply, however, that the ground state wavefunction does not jump chaotically: Every subsequent tunnelling involves shorter distances, with O(GN) spin flips, and achieves progressively better approximation to the true global minimum. Absent such a trend, annealing would be most difficult towards the end of the algorithm, when GB1/N, and the minimum gap would exhibit less favourable exponential scaling 26 .
In what follows, the model and its solution are described in greater detail. First, finite-size scaling of the 'easy' QCP bottleneck is linked to the thermodynamics of the phase transition. The next part goes beyond thermodynamics, considering small corrections to the extensive contribution to the free energy. The entire lowenergy spectrum, which depends on disorder realization, is mapped onto that of a simple quantum mechanical particle in a random potential. Finally, extreme value statistics is applied to investigate the properties of that random potential near its global minimum by mapping it to a Langevin process. This yields the distribution of hard bottlenecks in a universal regime (Go o1).
i are taken to be independent and identically distributed (i.i.d.) random variables of unit variance. The thermodynamics of this quantum Hopfield model has been worked out in great detail by Nishimori and Nonomura 32 . In fact, that study motivated the development of QA 4 .
When p is small (J ik B1/N), it is appropriate to replace local longitudinal fields with their mean values q is used to close this system of equations. For a transverse field below the critical value, GoG c ¼ 1, there appears a non-trivial (ma0) solution to the self-consistency equation for the macroscopic order parameter, a vector with p components: Here, the disorder variables are also written using vector notation: i Þ. This model is equivalent to the Curie-Weiss (quantum) ferromagnet, which has a continuous phase transition characterized by a set of mean-field critical exponents. Two of these are particularly useful in the analysis of the annealing complexity: the one for the singular component of the ground state energy ðE sing 0 =N / g j j a Þ as well as the dynamical exponent for the gap (E 1 À E 0 pg b ), where g ¼ G À G c is the 'distance' from the critical point. These exponents are defined for the infinite system, yet fairly general heuristic analysis (see the Methods section) predicts finite-size scaling for the QCP bottleneck: Substituting values a ¼ 2 and b ¼ 1/2 for the problem at hand, one may estimate the gap at the critical point and the width of the critical region to be O (N À 1/3 ) and O (N À 2/3 ), respectively. Worse-than-any-polynomial complexity of quantum annealing might be expected for the first order phase transition, which exhibits no critical scaling (but see ref. 33). Another possibility is for the dynamical exponent to diverge at the infinite randomness QCP: the finite-size gap scales as expð À c ffiffiffiffi N p Þ in a random Ising chain 22 . For the Hopfield model, however, this scaling is polynomial, as the disorder is irrelevant at the critical point. More intriguing is the fact that these pessimistic scenarios are not found in SK spin glass either: the model is characterized by the same set of critical exponents, albeit with logarithmic corrections [23][24][25] . These corrections to scaling increase the gap and, respectively, decrease the width of the critical region by a factor of log 2/9 N. Thus, as long as T ann \N, nonadiabatic transitions at the critical point should be suppressed. This presents a conundrum as SK model is known to be an NP-hard problem; finding a polynomial-time (even in typical case) quantum algorithm would be a surprising development. The heuristic analysis is clearly insufficient, but 'digging' deeper into a problem would require a more 'microscopic' analysis. In the following, the problem is mapped to ordinary quantum mechanics to uncover its lowenergy spectrum that explicitly depends on a particular realization of disorder, {n i }.
Exact low-energy spectrum. Mean-field theory can be derived in a more systematic manner via Hubbard-Stratonovich transformation. General overview is presented below; additional details can be found in Methods and the Supplementary Note 1. Finitetemperature partition function Z(b) ¼ Tr(e À bH ) can be written as a path integral over m(t), which now acquires a dependence on the imaginary time 0ptpb, with periodic boundary conditions: m(b) ¼ m(0). The value of the integral is dominated by stationary paths corresponding to the minimum of an effective potential VðmÞ. While the discussion above has been deliberately equivocal on the distribution of disorder variables, it is now instructive to contrast bimodal ðx ðmÞ i ¼ AE 1Þ and Gaussian choices. The shapes of the effective potential for both scenarios are depicted in Fig. 2.
The conventional bimodal choice defines the model of associative memory: In the limit G ¼ 0 the 'patterns' can be perfectly recalled (s i ¼ ± x (m) ) when p is small. For the Gaussian choice, the global minimum corresponds to a mixture 34 rendering memory useless. In the bimodal case, such 'spurious' states only become stable once the number of patterns scales with the problem size 35 : p40.05N. The BC p (bimodal) or O(p) (Gaussian) symmetry of the effective potential is only approximate, to leading order in N. The degeneracy of the ground state is 2 (due to global spin inversion symmetry) for almost all disorder realizations when pX3 or pX2 in the bimodal and Gaussian scenarios, respectively (note that that the p ¼ 2 bimodal case possesses an additional symmetry, which makes the ground state 4-fold degenerate). The system is in a symmetric superposition of the degenerate global minima at the end of QA. It evolves entirely in the symmetric subspace since the time- dependent Hamiltonian commutes withÛ ¼ expðpiŜ x Þ. Thus, it should be noted that small gaps between symmetric and antisymmetric wavefunctions are irrelevant to QA and can be ignored.
Disorder fluctuations 'nudge' QA towards the 'correct' pattern as it passes the critical point in the bimodal Hopfield model. No further bottlenecks are encountered; gaps for GoG c as well as the 'classical' (G ¼ 0) gap are O(1). By contrast, the classical gap scales as O(1/N) in the Gaussian Hopfield model, alerting to a 'danger' posed by the G ¼ 0 'fixed point'. To find the low-energy spectrum when GoG c , note that the dominant contribution to the path integral is from paths where the magnitude of the 'magnetization' vector remains approximately constant, close to its saddle-point value, while the angle is a slow function of time: mðtÞ % m G À sinWðtÞ cosWðtÞ for p ¼ 2. Integrating out the amplitude degrees of freedom, the partition function is rewritten as which describes a quantum-mechanical particle of mass M ¼ O(N) moving on a ring, subjected to a random potential where x i x i cos y i sin y i and the last term, written in shorthand, adds a constant offset so that hV(W)i n ¼ 0. Notice that V G ðWÞ / ffiffiffiffi N p via central limit theorem, thereby representing a higherorder correction to the extensive part of the free energy.
Since the partition function ZðbÞ ¼ P n e À bEn encodes information about the spectrum, low-energy (Goldstone) excitations of the many-body problem are in one-to-one correspondence with the energy levels of a quantum mechanical particle, up to a constant shift. The next step is to find the properties of this potential near a global minimum. These are relevant in a regime away from the critical point, Go oG c , where the universal behaviour characterized by the appearance of 'hard' bottlenecks sets in.
Evolution of the random potential. Scaling of the gap for GoG c can be obtained via semiclassical analysis. Small level splitting due to tunnelling between wells at the two degenerate global minima (this degeneracy is a consequence of the global Z 2 symmetry: is not relevant to QA. Higher degeneracies are statistically unlikely; quantization rules predict O(N À 1/4 ) gaps between energy levels within the symmetric subspace. But this refers to the typical gap, obtained for fixed G for almost all realizations of disorder. As quantum annealing sweeps the transverse field for a fixed realization of disorder, V G (W) might undergo global bifurcation. This would result in a small tunnelling gap for a specific value of G when the competing minima are in resonance.
Such a scenario is impossible near the QCP. Coefficients in the Fourier expansion of the random potential, S k (a k cos 2kW þ b k sin 2kW), decrease as m 2k /G 2k À 1 so that the first harmonic dominates for GEG c . Semiclassical analysis confirms a O(N À 1/3 ) gap at the critical point (where the curvature of the effective potential vanishes, leaving only the quartic part). As G decreases, the random potential becomes more rugged (see Fig. 3a), smooth only on scales DWBG, which makes global bifurcations more likely. Furthermore, it exhibits properties that allow one to make important predictions without detailed calculations. Rescaling the potential in the vicinity of either global minimum V G ðW Ã Þ ¼ V Ã G , W À W Ã ! 'ðW À W Ã Þ; describes the same model but for the rescaled G ! 'G and a different realization of disorder. This scale invariance is responsible for the logarithmic scaling of the number of tunnelling bottlenecks as has been explained earlier in the text. However, it still remains necessary to demonstrate that the density of bottlenecks is non-zero, which entails an examination of the properties of the random potential in the limit G ¼ 0.
The classical optimization problem corresponds to maximizing the magnitude of P i x i s i . A necessary condition for a local minimum is that two sets of vectors, {n i |s i ¼ þ 1} and {n i |s i ¼ À 1}, can be separated by a line. As the angle of this line changes, fluctuations of the amplitude give rise to a random potential V 0 (W) (see Fig. 4). This suggests a linear-time algorithm for finding a global minimum: Sort all vectors by angle (this may introduce an extra log N factor due to sorting overhead) and exhaustively check all possible angles. Of course, QA algorithm is too generic to exploit the specific structure of the problem; moreover ad hoc efficient algorithms are unlikely to exist for more general spin-glass problems.
On short intervals, the random process is described as an undamped Langevin process 36,37 in the continuous (N-N) limit (hence the exponent of 3/2 in equation (11), corresponding to its fractal dimension). Properly taking into account the statistics of the extremal properties, the process must be conditioned on the fact that V 0 ðW Ã Þ4V 0 ðW Ã 0 Þ ¼ V Ã 0 away from the global minimum. As described in Methods and the Supplementary Notes 2 and 3, such a conditioned process consists of two uncorrelated branches, defines w þ (W) and w À (W) parametrically, in terms of random processes n AE (t) that correspond to a Brownian motion in a nonlinear potential depicted in Fig. 3b. This potential is biased towards positive 'velocities' n so that V 0 À V Ã 0 $ ffiffiffiffi N p ðW À W Ã Þ 3=2 from equation (12). It will, however, experience arbitrary percentage drops due to subpaths with no0 (albeit with decreasing probability).
For small but finite G, the 'separating line' becomes blurred. The random potential adds a 'quantum correction' (see Methods and the Supplementary Note 2 For two minima to come into resonance, they cannot be more than DWBG apart (that is, O (NG) spin flips). The tunnelling exponent is given by under-the-barrier action A $ ffiffiffiffiffiffiffiffiffiffiffi ffi MDV p DW, where the effective mass MBN/G 2 and DV $ ffiffiffiffi N p G 3=2 , leading to equation (4). Numerical results for the universal distribution of tunnelling jumps and exponents are exhibited in Fig. 5.

Discussion
Poor scalability of classical annealing of spin-glass models had been linked to the phenomenon of temperature chaos 38 . Interestingly, its existence in mean-field glasses had been debated [39][40][41] , although it is uncontroversial in finitedimensional models 42,43 . Similarly, failures of quantum annealing might be attributed to transverse field chaos. The phenomenon described in this manuscript represents a much stronger finding. The mere fact that ground states at G and G þ DG will have vanishingly small overlap as N-N is not inconsistent with the continuously evolving ground state and poses no 'threat' to QA. By contrast, 'hard' bottlenecks correspond to isolated discontinuities that persist as DG-0.
To dwell upon the generality of these results, first note that scaling of the tunnelling exponent will depend on the universality class of the model. The SK model, for instance, exhibits different scaling of barrier heights, believed to be pN 1/3 (see for example, ref. 44). In the model studied, the decrease in the number of spins involved in tunnellings offsets the divergence of the effective mass in the classical limit G-0. As the height of the barriers also decreases, the tunnelling gaps widen towards the end of the algorithm. One might expect qualitatively similar behaviour in realistic spin glasses.
The logarithmic scaling of the number of bottlenecks is due to self-similar properties of the free energy landscape in the interval [G min , G c ]. The lower cutoff should correspond to the smallest energy scale in the classical limit, which for the SK model is also a negative power of N, namely 1= ffiffiffiffi N p . This is related to the linear vanishing of the density of distribution of effective fields as h-0 at zero temperature 45   less clear for CSPs, where the energies are constrained to be nonnegative integers; that is, the classical gap is O(1). These energy levels have exponential degeneracy, which is lifted by the transverse field. A value sufficient to make the spectrum effectively quasicontinuous might serve as an appropriate lower cutoff G min in problems of this type. Lack of 'hard' bottlenecks in the Hopfield model with the bimodal distribution of disorder and p ¼ O (1) could be attributed to the fact that the number of valleys is finite, which is not representative of a true spin glass. An interesting observation is that since 'hard' bottlenecks correspond to Landau-Zener crossings, annealing times need not be exponentially small. The probability that QA fails to follow the ground state every single time in n repeated experiments is which exactly matches the probability of failure for the annealing rate that is n times slower. Using shorter annealing cycles with many repetitions can minimize the effects of decoherence. It suffices that non-adiabatic transitions are suppressed at the critical point only, dG dt t1=N. Even with polynomial annealing rates, coherent evolution would require much better isolation from the environment than what is currently feasible. The only commercial implementation of QA (D-Wave) must contend with a fairly strong coupling to a thermal bath. On the positive side, it accommodates faster annealing cycles, acting as a 'safety valve' to dissipate any heat generated during the non-adiabatic process. At the same time, it all but ensures that the system is always in thermal equilibrium with the environment.
The theory presented here breaks down when GoT so that quantum bottlenecks described here may not be a limiting factor if ln (G c /T)t1/a. Main source of errors will be from exponentially many thermally occupied excited states. If the annealing profile were adjusted so that the energy spacing increased beyond T towards the end, this would effectively implement classical annealing. An intricate relationship between temperature, problem size, and the properties of the spin-glass model might determine which mechanism (quantum or classical annealing) will be dominant.
Yet another tradeoff in the design of D-Wave chip is a quasitwo-dimensional (2D) topology of interactions J ik due to fabrication constraints, which incurs significant performance penalty when mean-field models are 'embedded' into a 'Chimera' graph 46 . So-called 'native' problems corresponding to uniformly random instances on this quasi-2D lattice have been argued to be poor candidates for QA due to the lack of a finite-temperature classical phase transition 47 ; at the same time, a quantum phase transition at G c 40 is expected in 2D glasses 48 .
Whereas first-order phase transition immediately implies exponential complexity, even for small sizes, problems having a continuous phase transition may remain tractable up to a threshold, N c , beyond which tunnelling bottlenecks become dominant (a ln N c B1). This 'tractability threshold' serves as a silver lining fot this otherwise negative result. Moreover, the picture of 'hard' bottlenecks may be equally applicable to classical annealing. A recent study demonstrated that classically 'hard' instances that exhibit temperature chaos also take much longer time to solve on a D-Wave machine 49 . While in some crafted examples classical annealing is at a unique disadvantage due to first-order phase transition 50 , for most interesting problems both classical and quantum transitions are second-order. In such a scenario, the density of bottlenecks becomes a tie-breaker for evaluating relative performance. Whether quantum annealing can be advantageous in terms of this metric and determining which models will benefit is a practically important question for followup work.
There remains a question whether the failure mechanism described here can be somehow circumvented. Such a feat is feasible, for example, for a disordered Ising chain, where an exponentially small gap develops at the critical point, which is a manifestation of Griffiths singularities 22 . Modification of QA that requires controlling the transverse field for each spin individually can suppress these singularities and restore a polynomial gap.
Based on comparison with another exactly solvable model, it seems that frustration-in addition to disorder-is essential for the appearance of small gaps in the spin-glass phase proper. The seemingly random profile of the energy landscape for finite G heralds difficulties in avoiding these bottlenecks in generic spin glasses. Although any prospects of exponential speedup should be met with skepticism, heuristics inspired by spin-glass theory revolutionalized branch-and-bound algorithms for CSPs 51 . One can remain hopeful that theoretical advances can similarly aid quantum optimization.

Methods
Scaling analysis. Finite-size scaling of the gap at QCP is best understood using an example of a finite-dimensional system. In thermodynamic limit, both correlation length and characteristic time diverge near the phase transition as In a finite system this divergence is smoothed out as soon as the correlation length becomes comparable to lattice size (x c BN 1/d ). The minimum gap (the reciprocal of t) and the width of the critical region should scale as N À z/d and N À 1/(dn) , respectively. In this paper, the product zn has been labelled as exponent b. Singular behaviour of normalized ground state energy (free energy) is related to the specific heat exponent (a ¼ 2 À a). Dimensionality d can be eliminated with the aid of hyperscaling relation 2 À a ¼ (d þ z)n to yield equation (7) of the main text. Independent estimates of the specific heat exponent can be obtained from the exponents for the order parameter and the susceptibility (2 À a ¼ 2b þ g).
Mapping to quantum mechanics. Finite-temperature partition function can be written as a sum over a set of paths [s i (t)] with 0ptpb, where s i (t) alternates between the values ± 1. Hubbard-Stratonovich transformation can be used to rewrite it as a path integral The 'kinetic' term K G ½sðtÞ in the first equation penalises kinks, representing G-dependent ferromagnetic couplings between Trotter slices. As the interaction term is decoupled, the problem reduces to that of evaluating the single-site partition function Z i for a spin subjected to a magnetic field with the transverse component G and the 'time-dependent' longitudinal component h i (t) ¼ n i m (t). To leading order in N, the saddle-point of the path integral (15) is a solution of mean-field equations. This becomes a degenerate manifold for Gaussian disorder distribution; to determine higher-order contribution all paths such that |m(t)| ¼ m G are considered. Evaluating Z i is best performed in the adiabatic basis that diagonalizes the 2-level HamiltonianĤ i ¼ À h i ðtÞŝ z À Gŝ x , HereÊ i ðtÞ is diagonal with eigenvalues AE ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q . Its fluctuations around the mean give rise to the random potential V G (W). Non-adiabatic terms due to rotation of the basisÛ ¼ e À iViðtÞ are treated using second order perturbation theory, giving rise to a kinetic term p(dW/dt) 2 . Note a simple form of the perturbative term in equation (16) owing to the fact thatV i ðtÞ commute for all t. The details of this calculation are given in Supplementary Note 1.

Continuous limit.
In the limit N-N, the random potential V G (W) defined in equation (10) in the main text converges to a Gaussian process that can be specified completely by its covariance matrix hV G (W)V G 0 (W 0 )i. This can be 'diagonalized', alternatively expressing the random potential as a linear combination of whitenoise processes {z n (W)}. One representation, as a convolution series P 1 n¼0 ðf ðnÞ G Ãz n ÞðWÞ with kernels relies on orthogonal properties of associated Laguerre polynomials. The choice NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12370 ARTICLE a ¼ 1 ensures that only n ¼ 0 term survives in the limit G ¼ 0. The random potential should satisfy a stochastic equation As a side note observe that a similar equation is obtained by taking a continuous limit in the identities that follow from elementary geometry (see Fig. 4 in the main text, V k ¼ N 2 m 2 k ): For finite G, the form of the potential is modified as follows: It is convolved with a smoothing kernel of width DWBG, which raises the global minimum by O ffiffiffiffi N p G 3=2 À Á . Additional random contributions (from nX1) have similar scaling. This derivation is presented in greater detail in Supplementary Note 2.
Extreme value statistics. In the vicinity of the global minimum, the statistics of the classical potential is fundamentally different. The 'returning force' in equation (18) can be neglected; additionally the process should be conditioned on the fact that it stays above its value at W ¼ 0 (no generality is lost by choosing the global minimum as the origin). This problem has been a subject of a considerable body of work 36,37 , although important aspects ought to be revisited. Here, I present a particularly simple self-contained derivation.
A pair (w,u)-where w / V 0 À V Ã 0 is interpreted as the 'coordinate' and u ¼ dw/dW as the 'velocity' (W being 'time')-forms a Markov process. The probability distribution p (W; w, u) satisfies, for W40, @p @W ¼ À u @p @w þ 1 2 @ 2 p @u 2 ; while pðW; þ 0; uÞ ¼ 0ð8u40Þ ð 20Þ serves as a boundary condition for the absorbing boundary. This probability is 'renormalized' to condition on the fact that it survives until some YcW. It becomes a conserved quantity, but the diffusion equation adds a drift, À q(log P Y )/qu, repelling from the boundary. The 'survival' probability P Y in the limit Y-N is, up to 'time-reversal', the universal asymptotic solution of (20), reduced to ordinary differential equation using scaling ansatz: This exploits a fact that fractal dimensions of 'velocity' and 'coordinate' are [u] ¼ [W] 1/2 and [w] ¼ [W] 3/2 . The asymptotics is dominated by solutions with the smallest possible exponent, a ¼ 1/4 out of the infinite set of eigenvalues for the ordinary differential equation. This matches a known value obtained with a different method 36 . The next step performs a change of variables, introducing 'dimensionless' velocity n ¼ u/w 1/3 , and 'logarithmic' coordinate m ¼ ln w. With the 'time' variable redefined via dt ¼ dW/w 2/3 , Markov process is described by a tuple (W, m, n). Marginalizing out m and W in the equation for p (t; W, m, n) produces Fokker-Planck equation for a stochastic motion of particle in a potential Given a particular realization of n(t), the full process (m, n, W) is determined deterministically, by integration (see equation (12) in the main text). The construction of a realization of a random process is performed independently for W40 and Wo0. More detailed analysis is presented in Supplementary Note 3. Numerical simulations rescale this random potential instead of evolving the transverse field: W7 !e À DG G W and w7 !e À 3 2 DG G w. The process is extended to larger values of t as needed (details of the process for small t, where they fall below the numerical precision, are 'forgotten'). A fairly large range of t is required to gather the sufficient statistics.
Data availability. The data that support the findings of this study are available from the corresponding author upon request.