Spectral estimation for Hamiltonians: a comparison between classical imaginary-time evolution and quantum real-time evolution

We present a classical Monte Carlo (MC) scheme which efficiently estimates an imaginary-time, decaying signal for stoquastic (i.e. sign-problem-free) local Hamiltonians. The decay rates in this signal correspond to Hamiltonian eigenvalues (with associated eigenstates present in an input state) and can be classically extracted using a classical signal processing method like ESPRIT. We compare the efficiency of this MC scheme to its quantum counterpart in which one extracts eigenvalues of a general local Hamiltonian from a real-time, oscillatory signal obtained through quantum phase estimation circuits, again using the ESPRIT method. We prove that the ESPRIT method can resolve S = poly(n) eigenvalues, assuming a 1/poly(n) gap between them, with poly(n) quantum and classical effort through the quantum phase estimation circuits, assuming efficient preparation of the input state. We prove that our Monte Carlo scheme plus the ESPRIT method can resolve S = O(1) eigenvalues, assuming a 1/poly(n) gap between them, with poly(n) purely classical effort for stoquastic Hamiltonians, requiring some access structure to the input state. However, we also show that under these assumptions, i.e. S = O(1) eigenvalues, assuming a 1/poly(n) gap between them and some access structure to the input state, one can achieve this with poly(n) purely classical effort for general local Hamiltonians. These results thus quantify some opportunities and limitations of classical Monte Carlo methods for spectral estimation of Hamiltonians. We numerically compare the MC eigenvalue estimation scheme (for stoquastic Hamiltonians) and the QPE eigenvalue estimation scheme by implementing them for an archetypal stoquastic Hamiltonian system: the transverse field Ising chain.


Contents
Discussion 20 Appendix A Trotterization 22 Appendix B Extension to non-Hermitian propagation operators 24 Appendix C Median-of-means estimator 25 Appendix D Performance of ESPRIT on the imaginary-time (decaying) signal 25

Introduction
In general, it is a computationally intractable task to obtain, by classical or quantum means, the eigenvalues of a Hamiltonian H associated with a many-body quantum system. However, more restricted tasks related to estimating the spectrum of H can be executed on a quantum computer by means of quantum phase estimation algorithms [29,41,30,36], using the ability to simulate the real-time dynamics e −iHt/ efficiently on a quantum computer via Trotterization [25]. Classical alternatives are provided by quantum Monte Carlo methods [14,10]. The efficiency of quantum Monte Carlo methods when used to simulate many-body systems is generally limited by the sign problem. This can cause the variance of the estimator in the Monte Carlo algorithm to grow exponentially in the system size n, necessitating an exponential number of runs of the Monte Carlo algorithm.
A (ubiquitous) class of Hamiltonians that is sign-problem-free has been formalized under the name stoquastic Hamiltonians [6]. Roughly speaking, a (real-valued) Hamiltonian is stoquastic (in a particular basis B) if its off-diagonal elements are non-positive: x| H |y ≤ 0, for x = y (with |x , |y being elements of B). As a consequence, its associated Gibbs density matrix e −τ H is an element-wise non-negative matrix (for τ ∈ R + ). This property makes it particularly suitable for Monte Carlo sampling as complexity results [6,21,1] and various algorithmic results [5,4,8,9] have demonstrated.
Since stoquastic Hamiltonians are sign-problem-free, it is of interest to see if one can indeed prove that (part of its) spectrum can be efficiently estimated through classical Monte Carlo methods. Conversely, can quantum algorithms, even for stoquastic Hamiltonians, provide an advantage over Monte Carlo algorithms in carrying out this task? In this work, we address these questions by making a direct comparison between the task of estimating the spectral content of a stoquastic local Hamiltonian in an input state via a quantum circuit versus via a classical Monte Carlo scheme. In addition, we investigate to what extent this task can be efficiently carried out classically for a general local Hamiltonian.
Central in our study is, first of all, the real-time signal for k = 0, 1 . . . , K and where |Φ is some pre-specified n-qubit input state. The estimation of g R (k) for various k is a crucial step in the quantum phase estimation algorithm (QPE). In what follows we will fix ∆t so that the eigenstates |ψ j with nonzero or substantial overlap | ψ j | Φ | 2 > 0, showing up in the signal, have the property that E j ∆t ∈ [0, 2π). Thus, from now on, we assume that these E j are shifted and rescaled to lie 1 INTRODUCTION in [0, 2π). We will assume that there are at most S eigenvectors with nonzero | ψ j | Φ | 2 , where S is desired to be poly(n) or less for overall efficiency. Identifying a state |Φ which has non-zero overlap on only a few (S = poly(n) or S = O(1)) eigenstates and which obeys the assumptions in the following Theorems is not so simple, and can be considered one of the bottlenecks in using quantum phase estimation or other Monte Carlo methods to determine spectral information of the Hamiltonian. Besides the real-time signal, one can define the imaginary-time signal where again we can assume that E j ∈ [0, 2π). We will prove, for local stoquastic Hamiltonians, that the quantum cost of estimating g R (k) and the classical Monte Carlo cost of estimating g I (k) within error are approximately identical, although the assumptions on our knowledge/preparation costs of |Φ are slightly different in the two cases. The two statements are as follows: For a local Hamiltonian acting on n qubits, one can estimate g R (k) in Eq. (1) with probability at least 1−δ with sampling error and Trotter error trot (and total error tot = + trot ), using quantum circuits acting on n + 1 qubits, where the depth of the quantum circuit scales as O k 1+o(1) O −o(1) trot × poly(n) and the number of times one executes the circuit is Θ( −2 log 4δ −1 ), under the assumption that |Φ is a state of n qubits which can be generated by a poly(n)-size quantum circuit. Hence to obtain g R (k) for k = 0, . . . , K, with error at most tot = + trot for all k, with probability 1 − δ requires using quantum circuits for k = 0, . . . , K, each acting on n + 1 qubits, where the depth of the quantum circuit scales as O k 1+o(1) O −o(1) trot × poly(n) and each circuit is repeated Θ( −2 log 4δ −1 + log(K) ) times. Theorem 1.2. For a local stoquastic Hamiltonian acting on n qubits, one can estimate g I (k) in Eq. (2) with probability at least 1 − δ with total error tot = + trot , using a classical MC algorithm on n-bit strings where the depth of the algorithm scales as O k 1+o(1) O −o(1) trot × poly(n) and the number of times one runs the algorithm is Θ( −2 log(δ −1 )), under the assumption that |Φ = 2 n x=1 Φ(x) |x is a normalized state of n qubits such that (1) Φ(y) Φ(x) can be efficiently (poly(n)) calculated for a given x and y and (2) we can efficiently draw samples from the probability distribution P (x) = Φ(x) 2 . Hence to obtain g I (k) for all k = 0, . . . , K, with error at most tot for each k, with probability 1 − δ requires using a classical MC algorithm on n-bit strings for k = 0, . . . , K, where the depth of each algorithm scales as O k 1+o(1) O −o(1) trot × poly(n) and the number of times one runs the algorithm (for each k) is Θ( −2 log(δ −1 ) + log(K) ).
We then ask, given knowledge of either the real-time signal g R (k) or imaginary-time signal g I (k), what can be learnt about those eigenvalues E j , whose associated eigenstates have nonzero overlap with the input state |Φ ? The signals g I (k) and g R (k) respectively correspond to a probabilistic sum of decaying components and a sum of oscillating components with decay rates and oscillation frequencies E j as a function of discrete 'time' k = 0, . . . , K. Hence a method which extracts those decay and oscillation rates from knowing g I (k) or g R (k) at various k is needed. A method of choice which has already been used in quantum information theory is the matrix pencil method [35,20,33] (with equivalent methods known as ESPRIT and MUSIC). This method has been used for processing randomized benchmarking data [31,17], quantum phase estimation [30], spectral tomography of superoperators [16], for processing experimental time-series data to identify Hamiltonian parameters [15] or generally in processing discretely-sampled decaying Ramsey signals.
Using this method, it is known that if either g R (k) or g I (k) is known exactly for k = 0, . . . , K where K + 1 ≥ 2S, one can learn those eigenvalues E j and probabilities | ψ j | Φ | 2 exactly. However, in the presence of sampling and Trotter noise, the resolving power also depends on the gap between the eigenvalues E j , the number S of eigenvalues and whether we extract them from an oscillating or decaying signal. Our work is thus focused on understanding whether there are fundamental advantages in learning g R (k) with noise versus learning g I (k) with noise, as this quantifies the benefit of a quantum algorithm versus a classical algorithm for spectral estimation of (stoquastic) Hamiltonians.

INTRODUCTION
Not surprisingly, there are drawbacks to processing data from the imaginary-time evolution. As the signal decays exponentially, k cannot be chosen too large otherwise the signal becomes smaller than the noise. Our goal is to quantify this precisely and show that, at least theoretically, a regime exists in which the Monte Carlo method may be competitive.
The first statement we make can be viewed as a summary of previous work, namely it combines Lemma 2.1 via Theorem 1.1 and the performance of the ESPRIT method in Theorem 3.1 in the presence of a gap: Theorem 1.3. Given a local Hamiltonian on n qubits. Let the number of eigenvectors supported in some (efficient-to-prepare) input state |Φ be S = p 1 (n) (with p 1 (n) some polynomial in n), and each occurs with nonzero probability at least 1/poly(n). Furthermore, assume that the S eigenvalues {E i } with E i ∈ [0, 2π) are sufficiently well-separated, i.e. at least by a gap ∆ ≥ C/K with constant C and K = Θ(p 1 (n)). Then using Hadamard test (QPE) quantum circuits plus signal post-processing via ESPRIT, each requiring a poly(n) effort, one can resolve the eigenvalues For local stoquastic Hamiltonians the combination of Lemma 2.3, via Theorem 1.2 and the performance of the ESPRIT method in Theorem 3.2 in the presence of a gap leads to: Theorem 1.4. Given a local stoquastic Hamiltonian on n qubits. Let the number of eigenvectors supported in some efficient-to-sample (i.e. with poly(n) effort) input state |Φ be S = O(1), and each occurs with nonzero probability at least 1/poly(n). In addition, assume that for a fixed x, y it is efficient to compute Φ(y) Φ(x) . Furthermore, assume that the S eigenvalues {E i }, E i ∈ [0, 2π) are sufficiently well-separated, i.e. at least by ∆ ≥ 1/poly(n) with some poly(n). Then using a Monte Carlo algorithm plus signal post-processing via ESPRIT, each requiring (some) poly(n) effort, one can resolve the eigenvalues Theorem 1.4 immediately begs the question whether such a result could hold for general local Hamiltonians as well: the assumptions that there are only S = O(1) eigenstates in the initial state, as well as the assumption of efficient access to the initial state appear rather strong. To address this question, we define another realvalued, decaying signal as If S = O(1) and if g D (k) can be estimated with some accuracy for k = 1, . . . , K = O(1), we can also apply the ESPRIT method to extract these S eigenvalues. We note that this requires that the eigenvalues E j are bounded away from 2π. Hence if we use g D (k) we assume that we have shifted and rescaled the eigenvalues so that, say, the E j s lie in [0, π]. One can prove that for general local Hamiltonians, assuming S = O(1) eigenvalues in |Φ , one can estimate g D (k) with accuracy, under an assumption about the access to |Φ which is identical to the Monte Carlo case for stoquastic Hamiltonians (Theorem 1.4). In fact, this result shows that Theorem 1.4 is not particular to local stoquastic Hamiltonians at all, if we only care about 'nominally poly(n)' algorithms. However, the computational cost of estimating g D (k) for general local Hamiltonians is significantly higher in practice compared to the Monte Carlo method for stoquastic Hamiltonians. The result expressed in Lemma 2.4 can be viewed as 'dequantization' as it is similar in spirit to the Singular Value Transformation (SVT) tool (Theorem 3 in [12]). Theorem 3 in [12] is used to construct an algorithm that estimates the ground state energy of a Hamiltonian to O(1) (in n) precision, given an initial state with only some constant overlap with the ground state.
Applying the ESPRIT analysis to Lemma 2.4, we will obtain the following Theorem:  (1), and each occurs with nonzero probability at least 1/poly(n). In addition, assume that for a fixed x, y it is efficient to compute Φ(y) Φ(x) . Furthermore, assume that the S eigenvalues {E i }, E i ∈ [0, π] are sufficiently well-separated, i.e. at least by ∆ ≥ 1/poly(n) with some poly(n). Then using Lemma 2.4 plus signal post-processing via ESPRIT, each requiring (some) poly(n) classical effort, one can resolve the eigenvalues To investigate practical aspects of the MC scheme for stoquastic Hamiltonians and compare it to the quantum scheme, we numerically study the one-dimensional Ising chain in a transverse field g [34] in a proofof-principle setting. We numerically study, amongst several other aspects, the recovery of the ground-state and first-excited-state eigenvalues in the (g > 1)-regime from the signals g R (k) and g I (k) (in the presence of sampling noise and Trotter error) using the ESPRIT method.
An overview of the paper is as follows. In Section 2, we review the Hadamard or overlap quantum subroutine (Lemma 2.1) and we present the Monte Carlo algorithm (Lemma 2.3) for stoquastic Hamiltonians with its proof, as well as stating a straightforward Lemma 2.4 on 'dequantization'. Section 3 reviews the ESPRIT method and has an extensive Appendix D in which we prove the performance of the ESPRIT method for imaginary-time decaying signals using many lemmas also needed in the real-time signal case. The arguments for Theorem 1.5 are presented in Section 3 as well. In Section 4, we numerically compare the quantum scheme and the Monte Carlo scheme (for stoquastic Hamiltonians) for determining part of the spectrum of a transverse field Ising chain. In Section 5, we discuss our work and propose some directions for future study. Several appendices give additional background information and details.
We note that very extensive literature exists on the Monte Carlo power method [14] in which one applies a sequences of steps which gradually project an initial input state onto the ground state. In this method, unlike in our MC scheme of Lemma 2.3, one renormalizes the state after each iteration, so that the signal does not die out. In our approach, we do not renormalize, but study the decay rates themselves. In terms of other previous work, we note that in [4] the ground state energy of a stoquastic Hamiltonian was efficiently estimated by means of a projector Monte Carlo scheme, under an additional 'guiding state' promise. In [28] the authors consider the implementation of the imaginary-time evolution exp(−τ H) on a quantum computer in order to prepare a ground state of any local Hamiltonian. Note that our goal is not to prepare any ground or excited state but rather only learn some eigenvalues.
In the remainder of this section, we will review a few definitions which are used in this paper. Definition 1. Stoquastic Hamiltonians A (real-valued) Hamiltonian H is (globally) stoquastic [6] in a basis B if all its off-diagonal elements are non-positive: x| H |y ≤ 0, for x = y (and states |x , |y being elements of basis B).
In this work, we are interested in Hamiltonians that are local and stoquastic: Definition 2. Local Hamiltonians A Hamiltonian H associated with a system consisting of n degrees of freedom (e.g. spins/qubits) is local if it admits a decomposition into a set of Hermitian operators {H i } -i.e. N i H i -such that each H i acts non-trivially on O(1) (not growing with n) degrees of freedom of the system.
We denote the maximum number of degrees of freedom on which each H i acts non-trivially (i.e. its locality) by k and note that the number of terms in a local Hamiltonian is N = O(n k ).
For local Hamiltonians there is a slightly stronger notion of stoquasticity, called termwise stoquasticity, which can differ from the definition of stoquasticity given above, see [6,21]. Definition 3. Termwise stoquastic Hamiltonians A (real-valued) k-local Hamiltonian H is m-termwise stoquastic in a basis B if it admits a decomposition into (real-valued) m(≥ k)-local terms {H a } such that each H a is stoquastic: ∀a, x| H a |y ≤ 0, for x = y (and states |x , |y being elements of basis B).
Most many-body Hamiltonians considered in physics which are stoquastic are O(1)-termwise stoquastic. The results in this paper apply to both termwise stoquastic as well as globally stoquastic Hamiltonians (using some small adaptions employing results in [21]), and we will refer to them simply as 'stoquastic'.
For a matrix X we will use the operator or spectral norm X = λ max (X † X) = σ max (X), where σ max (X) is the largest singular value of X. We also refer to the Frobenius norm X F = Tr(X † X) and the induced−∞ norm X ∞ = max i j |X ij |. For an m × n matrix X, we use X ≤ √ m X ∞ and X ≤ X F .

Quantum scheme versus Monte Carlo scheme for spectral estimation
In this section we show how to estimate g R (k) on a quantum computer, and g I (k) for stoquastic Hamiltonians via a Monte Carlo algorithm, as well as how to estimate g D (k) inefficiently (in k) via a classical algorithm for general local Hamiltonians. Lemma 2.1 states a well-known quantum subroutine, namely the Hadamard or overlap test, while a new result, a Monte Carlo version of the routine, is proved in Lemma 2.3. After these Lemmas, the proofs of Theorems 1.1 and 1.2 are given. Then we give Lemma 2.4 for general local Hamiltonians, using similar tools as in Lemma 2.3.
We note that the overlap test is used in versions of quantum phase estimation which do not aim at preparing an energy eigenstate of the Hamiltonian, but rather only learn the spectral content in its input state, as in Refs. [30,36,24]. Here we basically follow this approach for the real-time quantum evolution, which can in addition be randomized to save on implementation costs, see [42].
|x is a state of n qubits which can be generated by a poly(n)-size quantum circuit. (ii) Each G l is a k-local unitary matrix. F can be estimated within error with probability at least 1−δ with a quantum circuit with Θ( −2 log 4δ −1 )× [Θ(L) + poly(n)] single and two-qubit gates.
Proof. Figure 1 depicts the quantum circuit which is used. It involves an n-qubit register and a single ancillary qubit. The state of the composite system can be tracked through the circuit and the final state can be found to be (where R(θ) ≡ e −iθZ/2 ): A Z-measurement is now performed on the ancillary qubit, measuring either |0 or |1 with associated outcomes resp. m = 0 or m = 1. The probability to measure state |0 (m = 0) on the ancillary qubit after application of the depicted gates is then given by: In the final expression, we have restricted ourselves to θ = 0 and θ = π 2 , which are the θ values of interest. Suppose that for θ = 0 and θ = π/2, the quantum circuits are repeated |Σ| times to obtain a set 2|Σ| of independent realizations of the ancillary-qubit state to be measured and let Σ θ=0 0 and Σ θ=π/2 0 be the number of times the ancilla measurement returns 0 so that is our (unbiased) estimator, i.e. E(F(t)) = F(t). Then by means of the Chernoff bound we have where the number of samples is chosen as |Σ| = Θ( −2 log 4δ −1 ). Figure 1: Basic circuit with a single ancillary qubit and an nqubit register (initialized in state |Φ ).
Next, we will consider a classical Monte Carlo version of the quantum routine given above. The key result here is Lemma 2.3. However, before we can state it, we collect a few facts about matrices G i which will be useful in the proof of Lemma 2.3. The matrices G i that we will consider now can be seen as analogous to the (unitary) local real-time propagation operators G i considered earlier in Lemma 2.1, but are now local imaginary-time propagation operators. The G i are no longer unitary but, for local stoquastic Hamiltonians, are elementwise nonnegative. They are of the form G i = e −ai/M kHi , where a i /M is a positive parameter set by the Trotterization scheme and k = 0, 1, . . . , K denotes imaginary-time coordinate. We have the following proposition on further properties of these operators: • From the Perron-Frobenius Theorem (Theorem 8.4.4 in [18]) it follows that for each nonnegative and irreducible sub-matrix there exists a unique and strictly positive eigenstate associated with its largest eigenvalue, i.e.
where φ b i (x) > 0, ∀x ∈ S b i . Since the spectrum of G i is the union of spectra of the submatrices G b i , the spectrum of each G b i also lies in the interval (0, 1] and one of the blocks b will contain the largest eigenvalue of G i equal to 1. In case G i is irreducible itself, there is a largest nonnegative eigenvector as in Eq. (8) which has support φ i (x) > 0 for all x. In this case, the corresponding eigenvalue will be λ i = 1.
• Naturally, since G i acts nontrivially only on a subset of O(1) qubits (and acts as I on other qubits) one can efficiently compute the blocks G b i , its largest eigenvalue λ b i and associated eigenstate φ b i in each block b.
We prove the following: |x is a normalized state of n qubits where Φ(x) ∈ C (∀x) and x Φ(x) 2 = 1. We assume that (1) Φ(y) Φ(x) can be efficiently (poly(n)) calculated for a given x and y and (2) we can efficiently draw samples from the probability distribution P (x) = Φ(x) 2 .
Proof. The proof of Lemma 2.3 consists of two steps: To construct an estimator for F(τ ) and to show that the error of this estimator can be bounded according to the lemma. We rewrite the quantity of interest F as follows (where L − 1 complete sets of basis states are inserted in between the G l operators in the final equality): where we have set |x = |x 0 and |y = |x L . F thus corresponds to the sum of an exponential number of products of (non-negative) matrix elements of G 1 , ..., G L , weighted by amplitudes in the state |Φ . Evidently, only terms for which all the matrix elements in the product are non-zero contribute to the sum. We now consider the string of basis states |x 0 , ..., |x L and associate with each step |x l−1 to |x l in this string a probability where b labels the sub-block in G l = ⊕ b G b l which contains the strings x l−1 and x l .
The probability distribution P l is thus non-negative as λ b l ∈ (0, 1], G l is element-wise non-negative and φ b l (x l ) > 0 and φ b l (x l−1 ) > 0. It can be shown to be normalized: We use P l (x l−1 → x l ) to rewrite F(τ ) as: where x ≡ (x 0 , x 1 , ..., x L ) and we have defined the quantities Π(x) and R(x). Since |Φ(x 0 )| 2 and each P l are probability distributions, Π(x) is a probability distribution as well, i.e.
x0,x1,...,x L Clearly, one can sample from Π(x) by first sampling from |Φ(x 0 )| 2 , then sampling from P 1 (x 0 → x 1 ) to generate x 1 etc. until x L . By thus sampling from the probability distribution Π(x) and obtaining a mean estimator for F(τ ) using the samples R(x), we can estimate F(τ ). We note that F(τ ) = E R(x) . Since R(x) ∈ C, a mean estimator over a finite number of samples will generally be complex-valued. Since F(τ ) ∈ R, we will instead obtain a mean estimator using samples Re(R(x)). The mean estimator that we shall use to estimate F(τ ) is the median-of-means estimator [26]. Using a set Σ of samples {x} (distributed according to Π(x)), the medianof-means estimator is defined as follows: Divide the set Σ into q subsets s 1 , . . . , s q of size approximately |Σ|/q. Calculate the empirical mean of Re(R(x)) over the samples in each subset: f j = 1 |sj | x∈sj Re(R(x)) for j ∈ {1, . . . , q} (each f j is an unbiased estimator of F(τ )). Now the median-of-means estimator is given by the median of these empirical means:F = M(f 1 , . . . , f q ). See Appendix C for more details.
The algorithm that efficiently producesF = M(f 1 , . . . , f q ) is explicitly given in Algorithm 1 below. Note that when Φ(x 0 ) is small for some x 0 , the probability of drawing this x 0 , |Φ(x 0 )| 2 , is very small, but the ratio Φ(x L ) Φ(x0) in the estimator could get very large. Algorithm 1: Efficiently obtaining a median-of-means estimate of F(τ ) through sampling of the probability distribution Π(x).
Sample an initial basis state |x 0 from the probability distribution |Φ( .
Algorithm 1 thus efficiently provides an estimate of F(τ ) (albeit biased). To complete the proof, we will show that the variance of Re(R(x)) ∈ R can be bounded which in turn is used to bound the number of samples to get an estimate close to the mean, leading to Lemma 2.3.
For a complex random variable Z = R(x), E(Z) ≡ E Re(Z) + iE Im(Z) and Var Z = Var Re(Z) + Var Im(Z)) ≥ Var Re(Z) . Hence we can bound the variance of random variable Re(R(x)) by bounding the variance of the random variable R(x). This variance is given by: where the inequality holds because F 2 ≥ 0 (since F ∈ R). To obtain an upper bound on the variance, we shall investigate this expression in more detail: where in the last equality we defined the non-negative quantity . Exploiting the Hermiticity of G b l , Q l (x, y) can be shown to have the following property: Q l (x, y) thus satisfies 0 ≤ Q l (x, y) ≤ 1, ∀x, y and ∀l ∈ {1, 2, ..., L}. By consecutively exploiting the property in Eq. (16) for all Q l 's and the normalization property of state |Φ in the expression in Eq. (15), we obtain If we take the number of samples |Σ|, and divide them into q subsets s 1 , . . . , s q of size approximately |Σ|/q, then (by means of Chebyshev's inequality) each f j obeys |f j − F | ≤ Var Re(R(x)) 4q/|Σ| ≤ 4q/|Σ| with probability at least 3/4. Using Hoeffding's inequality and the definition of the mean, one can show that (see Appendix C): Hence F can be estimated with error with probability at least 1 − δ (with q = 8 log(δ −1 )) for |Σ| = Θ(log(δ −1 ) −2 ), where obtaining each sample takes a number of operations that scales linearly in L and poly(n). This completes the proof of Lemma 2.3.
Remark. Note that if one would have chosen the empirical meanF = 1

|Σ|
x ∈Σ Re(R(x)) as a mean estimator for F (instead of the median-of-means estimator), then using Eq. (17) and Chebyshev's inequality, we obtain: Hence F can be estimated usingF with error with probability at least 1 − δ, for |Σ| = Θ(δ −1 −2 ). Using the median-of-means estimator thus provides an exponential improvement in the required scaling of |Σ| with δ −1 .
Note that if we could upper and lower bound the range of Re(R(x)) by some constants, then we could have used a Chernoff-Hoeffding bound for the empirical meanF which gives the aforementioned (exponentially) better dependence of the run-time of the algorithm with δ −1 (as in Lemma 2.1 where we do use a Chernoff-Hoeffding bound).
Remark. Note that the Lemma also applies to estimating x| G 1 G 2 ... G L |x (with 1/poly(n) accuracy) as one simply starts the process at x 0 = x and R(x) is only nonzero when one arrives at and compute for a given x and y, the ratio Φ2(y) Φ1(x) . In addition, one can extend the Lemma to the case where the local propagation operators G l are not Hermitian, but are still nonnegative matrices, see Appendix B.
We stress that Lemma 2.3 provides an efficient classical algorithm provided that: For a given x, y ∈ {0, 1} n , one can efficiently determine Φ(y) Φ(x) and the state |Φ is such that one can efficiently draw samples from P (x) = Φ(x) 2 . In many practical settings, |Φ is such that one can define a function f : {0, 1} n → C which takes as input the n-bit string x, and efficiently outputs the corresponding coefficient Φ(x). This is e.g. the case for (matrix) product states or for other ansatz classes of states. Then, given x and y, the fraction Φ(y) Φ(x) can be efficiently obtained. Note that under this assumption one can set-up a Monte Carlo scheme based on the Metropolis algorithm to sample from Φ(x) 2 , although this scheme is only a heuristic strategy and its efficient convergence would have to be proved. A good class of states to which both Lemmas 2.3 and 2.1 apply are of course product states. Note that even when running the overlap test is too costly (as quantum circuits are noisy), but preparing the state |Φ is feasible, one could use this preparation to sample from |Φ(x)| 2 for the application of the MC method. Of course the requirement of being able to compute Φ(y) Φ(x) remains. For the transverse field Ising model, an example of a |Φ which obeys these conditions will be given in Section 4.
Proof of Theorems 1.1 and 1.2: We require the Trotterization of e −ikH resp. e −kH into a string of local propagation operators G i which are unitary (in Lemma 2.1) resp. Hermitian and non-negative (in Lemma 2.3). This non-unique decomposition of e −ikH and e −kH into an ordered string of local propagation operators depends on the Trotterization scheme and is discussed in Appendix A (and more extensively in [7]). The Trotterization gives an error trot (in addition to the sampling error in Lemmas 2.  (1) trot . Then, if we wish to estimate g R (k) and g I (k) at multiple k = 0, . . . , K, we use that the probability that all K estimates are up to uncertainty equals unity minus the probability that at least one of the estimates is beyond (which, by the union bound, is at most Kδ).
Finally, before we move on to extracting eigenenergy estimates from the (real-time and imaginary-time) signals using the ESPRIT method, we prove the Lemma related to the signal g D (k) in Eq. (3).
Lemma 2.4. Let g D (k) be defined as in Eq. (3) for a local n-qubit Hamiltonian H, with E j in [0, π], and assume that (1) one can efficiently (i.e. with poly(n) effort) sample from |Φ(x)| 2 , and (2) given x and y, one can compute Φ(y)/Φ(x) efficiently. Then, g D (k) can be classically estimated within error with probability Proof. By definition of g D (k), we can write To estimate g D (k), one first draws an x from P (x) = |Φ(x)| 2 , and then one collects all y which are obtained after the application of (I − H/2π) k to x|. Each application of I − H/2π maps the input string onto at most poly(n) new output strings, hence one obtains at most [poly(n)] k such y's after k applications. Let be a particular path of strings and let For each x that is sampled from P (x), one thus computes and outputs Re(R(x)) by summing over the contributions from all paths x that start at string x. As in Lemma 2.3, we need to establish how many samples |Σ| we need to draw from P (x) to obtain g D (k) within error with probability at least 1 − δ. This analysis depends on the variance of the complex variable R(x) through Eq. (14), requiring us to upper bound where in the final line we have used that the eigenvalues of H lie in [0, π]. As in the proof of Lemma 2.3, this establishes that Var Re(R(x)) ≤ 1. Then we can use the median-of-means estimator as in the proof of Lemma 2.3 and Appendix C to establish that with probability at least 1 − δ, g D (k) can be estimated with error at most , taking |Σ| = Θ( −2 log δ −1 ) samples from P (x) = |Φ(x)| 2 , and with [poly(n)] k computational effort per sample.
It is important to note that unlike in Lemma 2.3, here we only sample x and compute the rest as the estimator R(x), while in Lemma 2.3 we sample the whole path of length L. This is why the computational effort in Lemma 2.3 is efficient (linear) in L and thus polynomial in k, while in Lemma 2.4 the computational effort is exponential in k. This is thus the difference between the stoquastic Hamiltononian case versus the general Hamiltonian case. Note also that one can take each G i in Lemma 2.3 to be G = I − H/2π in principle, as it obeys condition (ii) when H is stoquastic.
We note that in [12] the sampling-access assumption is formulated slightly differently, that is, one gets access to Φ(x) for a given x, which can be stronger than only knowing the ratio Φ(x)/Φ(y) for a given x and y. In addition, Ref. [12] allows an additional error in the sampling access whereas we gloss over this here and assume perfect sampling-access (similar to the exact assumptions in the other Lemmas).

Classically processing the signal: the ESPRIT method
We turn to discussing the ESPRIT method [23] which is a method like the matrix pencil method [35,20,19] for processing a signal as in Eqs. (1) and (2) consisting of S components. Indeed, suppose a set of values for the signal g(k), where |z j | ≤ 1, for k ∈ {0, 1, ..., K}, K even. The goal is to determine the z j and the coefficients c j > 0 ‡ using g(k) for sufficiently many k. In case of the real-time signal g R (k), we have z j ≡ e −iEj , in case of a purely-decaying imaginary-time signal g I (k), we have z j ≡ e −Ej ∈ (e −2π , 1] and for the purely-decaying signal Due to sampling and Trotter noise, one is effectively given a noisy signal y(k) (for k ∈ {0, 1, ..., K}), which is related to the original signal g(k) by: where η(k) denotes e.g. the sampling and Trotter noise, and we have |η(k)| ≤ tot in Theorem 1.1 and 1.2 with high probability. It is well-known that for a noiseless signal (η(k) = 0), the z j 's and the c j 's can be resolved perfectly via ESPRIT and the matrix pencil method if we take K + 1 ≥ 2S. Importantly, this result does not depend on whether the signal is oscillatory or decaying. For illustration, Figure 2 depicts the results of application of the matrix pencil method to a noiseless signal. We consider separately a decaying signal and an oscillating signal, and for both cases we depict respectively the estimates of the decay rates and oscillation frequencies as a function of K. When K + 1 ≥ 2S, the eigenvalues are indeed resolved both for the decaying and the oscillating signal. When K + 1 < 2S, the eigenvalues are not resolved. We will see however, that in the presence of noise a decaying or oscillatory signal fares very differently. ‡ Here we focus on determining the z j , but given the z j one can determine the c j as well and methods for analyzing the performance also exist for this [27]. Figure 2: Estimates of the decay rates (of a decaying signal) and oscillation frequencies (of an oscillating signal) as a function of K. The estimates are obtained from applying the matrix pencil method [35,20,19] to the noiseless signals g(k) = S j=1 c j z k j , where z j = e −Ej (in the case of the decaying signal) and z j = e −iEj (in the case of the oscillating signal) for k = 0, 1, . . . , K where E j ∈ [0, 2π). All c j 's are set equal to 1/S and the E j 's have been randomly produced. The eigenvalues are recovered for K + 1 ≥ 2S.
Let us consider in more detail the task of obtaining the z j 's from the signal y(k) in Eq. (24). The key object of study here is the Hankel matrix H(y) := H(g) + H(η), containing all K data points of the noisy signal y(k) and a positive integer 'matrix pencil' parameter L: where H(η) is purely due to the noise and has norm H(η) . We can decompose the Hankel matrix H(g) of the noiseless signal in terms of Vandermonde matrices V L : where and V L is In general, methods such as ESPRIT (see the ESPRIT Algorithm 2) rely on the parameter L and for convenience we will keep it general in some of the analysis (specifically in Appendix D). Our results will, however, focus on the choice L = K/2. For L = K/2, we have and Making contact with error bounds in the previous section, we see that From the 'Vandermonde decomposition' in Eq. (26) of the Hankel matrix encoding a real-time or imaginary-time signal, one can develop numerical algorithms to extract the the decay rates z i . One such algorithm is ESPRIT (given in Algorithm 2), which specifically exploits the relation between the Vandermonde decomposition of H(y) and its singular value decomposition.
We will see that this algorithm comes with recovery guarantees on the parameters z 1 , . . . , z S , in both the real-time and imaginary-time signal case, provided the noise vector η is small enough. The strength of these Algorithm 2: ESPRIT algorithm.
Data: Time signal y, number of decay rates or oscillation frequencies S. Result: Listz 1 , . . . ,z S . K ← length(y); /* We will assume K is even for simplicity. */ L ← K/2; /* Not the most general choice, however it works well in practice. */ H(y) ← Hankel matrix built from y; U ,Σ,W ← SVD(H(y)); /* Make sureΣ is decreasingly ordered. */ U S ← First S columns ofŨ ; /* RememberŨ is a (L + 1) × (L + 1) unitary matrix */ U 0 ← First L rows ofŨ S ; U 1 ← Last L rows ofŨ S ; Ψ ←Ũ + 0Ũ 1 ; /* Make S × S signal matrixΨ, + denotes Moore-Penrose inverse. */ z 1 , . . . ,z S ← eigenvalues of signal matrixΨ. guarantees differs significantly between the two types of signal, and we will discuss them separately in the next sections. From thez j 's we can then (for both the real-time and imaginary-time signal) extractẼ j 's, which denote the S estimates for {E i ∈ [0, 2π)} S i=1 returned by the classical post-processing algorithm. The error in the energy estimates is set as the optimal matching distance [3] i.e. the returned list is optimally matched with the actual eigenvalues and the error is set by the largest mismatch.

Real-time (oscillatory) signal
In this section we discuss the performance of ESPRIT on real-time (oscillatory) signals. This performance has been well studied in the signal processing literature. Here, we will follow the analysis of [23], which provides Theorem 3.1 relating H(η) in Eq. (31) and the energy matching error defined in Eq. (32). The performance of ESPRIT in the oscillatory signal case relies on lower bounding the smallest nonzero singular value of the Vandermonde matrix V L=K/2 in Eq. (28), (or similarly upperbounding the condition number κ(V K/2 ) = σ max (V K/2 )/σ min (V K/2 )). The smallest nonzero singular value of the Vandermonde matrix V K/2 will depend on K, S and the location of the poles z j . For the real-time signal, the z j lie on the unit circle whereas for the imaginary-time signal the z j lie in the interval (e −2π , 1]. Let the minimal gap between the E i be defined as It has been proved [27] for z j = e −iEj that ). Let (g + η)(k) be a real-time signal with k = 0, . . . , K, and with g(k) = S i=1 c i z k i , c i > 0 ∀i, c min = min i c i and η(k) a small noise vector. Let z j = e −iEj with j = 1, . . . , S and E j ∈ [0, 2π) ∀j, and K ≥ 2C/∆ for some constant C > 2 with gap ∆, and K + 1 ≥ 2S. If with then the ESPRIT algorithm outputs energy estimates {Ẽ j } with distance with By Eq. (31) we have H(η) ≤ K tot and if we choose K ∼ S, tot can be chosen sufficiently small, inversely polynomial with S, such that at least Eq. (35) holds. Then d({E i }), {Ẽ j }) will be Θ( H(η) S), hence decreasing like S 2 tot . If we combine this Theorem with the quantum results of Theorem 1.1, then we obtain Theorem 1.3. These results thus form the theoretical underpinning of the ideas and numerical work in [30] in which quantum phase estimation was replaced by the repeated execution of a circuit applying controlled-U k (conditioned on an ancilla qubit state) which gets Trotterized to the overlap test circuit in Fig. 1.
Remark. It is noteworthy that even when the eigenvalues E j are not well-separated but occur in 'clumps', results exist [23] which bound the performance of ESPRIT.

Imaginary-time (decaying) signal
Let us now discuss what information can be extracted from the imaginary-time signal in the presence of sampling and Trotter noise and compare this to the known Theorem 3.1 for the real-time signal.
In Appendix D we discuss in detail the recovery guarantees for ESPRIT for imaginary-time signals. This analysis is an adaptation of the work done in [23] for real-time signals, with the only true novelty being Lemma D.7. However, since no rigorous analysis for imaginary-time signals exists in the literature we go through all the steps in considerable detail. The analysis will again depend on the condition number of the Vandermonde matrix V L=K/2 in Eq. (28).
This condition number is much worse behaved, i.e. much larger, in case the z i 's all lie on the real axis -which is the case for the imaginary-time signal-but bounds on this condition number do exist [2]. Based on the work of Gautschi [11], we derive our own upper bounds on this condition number, which are asymptotically sub-optimal but have a clearer dependence on the choice of K and the given S than previous bounds in [2]. We then use the gap ∆ to fill in the upper bound.
In analogy to Theorem 3.1, we then obtain the following: Theorem 3.2. Let (g + η)(k) be an imaginary-time decaying signal with k = 0, . . . , K, and with g(k) = S i=1 c i z k i , c i > 0, ∀i, c min = min i c i , and η(k) a small noise vector. Let z i = e −Ei with E i ∈ [0, 2π) and given eigenvalue gap ∆ < 1 in Eq. (33), and {Ẽ i } the energy estimates of ESPRIT with L = K/2. Let K + 1 ≥ 2S, K even and K = T S for some positive integer T . If we have with with g 2 (S, ∆) = e 2π 640 √ 2 S 5.5 (e −2π π∆) −5(S−1) .
Since the dependence on S is exponential in Eq. (41), one cannot make the distance d({Ẽ i }, {E j }) small when the number of eigenvalues S = poly(n), no matter what the gap. This is a crucial difference with the oscillatory real-time case. However, for S = O(1), with sufficient, poly(n), effort one can make H(η) sufficiently small to obey Eq. (39) and then reduce the error on the found eigenvalues to 1/poly(n). This assumes that the gap between the O(1) rescaled eigenvalues present in the initial state is at least 1/poly(n) (and not exponentially small in n).
Furthermore, given that H(η) should decrease at least as ∼ 1/ √ K through Eq. (39) but the upper bound in Eq. (41) scales as H(η) K 3/2 , one obtains the optimal bound by choosing the minimal K, namely K = 2S, so that L = K/2 = S. In this case the Vandermonde matrix V L−1 = V S−1 is square §. This expresses the intuitive fact that increasing K will not help beyond a point, as for larger K the signal simply dies out. This is unlike the oscillatory case of Theorem 3.1 in which the optimal K is required to grow with 1/∆. Here the bound does not require that K grows with 1/∆, so there is no 'super-resolution'. We note that the upper bounds may have a sub-optimal dependence on K and S, which is due to the proof techniques. Practically (roughly) speaking, whenever the condition number of the Vandermonde matrix V L=K/2 grows by choosing a larger K, choosing that larger K can be beneficial.
For the other decaying signal (g D (k)), a rather small change from z i = exp(−E i ) to z i = 1 − E i /2π gives: withg 2 (S, ∆) = 640 Now to argue Theorem 1.5 from Theorem 3.3, we simply choose the minimal K = 2S, and since S = O(1), it implies that the classical algorithm which estimates g D (k) for k = 0, . . . , K(= O(1)) within error using Lemma 2.4 requires poly(n) effort.

Spectral estimation for a transverse-field Ising chain
In this section, we numerically investigate the methods described thus far by applying them to an archetypal stoquastic Hamiltonian: The transverse field Ising chain. This system has been extensively studied [34] and will serve as a proof-of-principle test. The system consists of qubits on a one-dimensional lattice, which interact via an Ising interaction and are exposed to an external magnetic field in the transverse direction. The Hamiltonian associated with this system is: where X, Y, Z denote the Pauli matrices, J > 0 (for a ferromagnetic interaction) and g ≥ 0, so that H is term-wise stoquastic in the standard basis. We take the field to be pointing in the x-direction without loss of generality . § Hence, strictly speaking Lemma D.11 is not much of a help. The Hamiltonian can be transformed toH = U HU † by the unitary transformation U = i exp iθZ i 2 , which alters the direction of the field in the transverse plane while preserving the spectrum.
The system exhibits an abrupt change in the ground state of the system as a function of g at g = 1 (for n → ∞). On either side of the phase transition, one has: • Strong-coupling limit (g 1): In this limit, the Hamiltonian is dominated by the magnetic field terms and the ground state is given by |ψ 0 ≈ |+ ⊗n . The p-particle excitations correspond to states |− q1 |− q2 ... |− qp i =q1,q2,...,qp |+ i , i.e., the ground state with spin flips at p sites q 1 , ..., q p along the chain. These p-particle excited states are n p -fold degenerate. • Weak-coupling limit (g 1): In this limit, the Hamiltonian is dominated by the Ising interaction terms and the (degenerate) ground state is given by either |ψ 0 ≈ |0 ⊗n or |ψ 0 ≈ |1 ⊗n (ferromagnetic phase). The excitations w.r.t. the ground state correspond to domain walls separating ferromagnetic regions of opposite spin.
To run the Monte Carlo scheme described in Lemma 2.3, the imaginary-time propagation operator e −kH must be decomposed (by means of Trotterization) in terms of the local propagation operators e −a l k/M Hi (where a l and M are set by the Trotterization scheme) ¶. The local propagation operators acting on a subset of two qubits on the chain are given by: where λ = J 1 + g 2 andk = a l k/M . This operator is element-wise non-negative and can be efficiently brought to bock-diagonal form (with each block being irreducible).
Since the choice of |Φ directly governs which eigenvalues can be obtained from the real-time and imaginary-time evolution signals, it is a point of particular importance. In addition, the ability of ESPRIT to extract eigenvalues from the imaginary-time and real-time signals depends very strongly on the spectral gap between the eigenvalues in the signal. We consider a state |Φ which has considerable overlap with the ground state and the (n-fold degenerate) first excited state in the (g > 1)-regime. Since the gap between their associated eigenvalues increases monotonically as a function of g in this regime, this allows us to present the aforementioned gap dependence numerically. We shall call the state |Φ optimal since in the (g 1)-regime it optimally overlaps with the eigenstates of interest, i.e. | + ⊗n |ψ p=0 | 2 = n q=1 | + ⊗n |ψ p=1,q | 2 = 1 2 . This state is given by: where x∈{0,1} n−1 |x denotes an equal superposition of (n − 1)-bit strings that exclude the bit in register q.
We note that for |Φ optimal , one can efficiently obtain Φ(y) Φ(x) for a given x, y ∈ {0, 1} n and one can efficiently sample from |Φ(x)| 2 : From Eq. (49), one can infer a function Φ(x) ({0, 1} n → R) that (efficiently) gives the coefficient of the state |Φ optimal associated with an n-bit string x: Φ(x) = 1/2 (n+1)/2 1 n + 1 √ n n − |x| + 1 n − 1 √ n |x| , so Φ(x) only depends on the Hamming weight |x| of bit string x, i.e. the quantity Φ(y) Φ(x) can be efficiently determined. Furthermore, since Φ(x) only depends on n and |x|, the distribution |Φ(x)| 2 also depends solely on these quantities. This implies that one can indeed efficiently sample from this distribution: First, one draws a Hamming weight |x| from the distribution |Φ(x)| 2 = |Φ(|x|)| 2 . Then, given |x|, one constructs ¶ We note that the numerical results presented in this section are obtained using a first-order Trotter decomposition.
at random an n-bit string with this Hamming weight. This latter step can be efficiently implemented by starting from some n-bit string with Hamming weight |x| (such as {1} |x| {0} n−|x| ) and then applying a random permutation.

Numerical method and results
We briefly discuss the details of the numerical analysis that is used to obtain the results presented in this section. We use the Monte Carlo and quantum algorithms (where the latter is inefficiently implemented on a classical computer), which are presented in Section 2 and summarized in Theorems 1.2 and 1.1, to obtain resp. the imaginary-time and real-time evolution signals for the transverse-field Ising chain. We note that here we estimate the imaginary-time evolution signal using the empirical mean estimator, instead of the (asymptotically superior) median-of-means estimator. Having obtained these signals, we obtain estimates of the eigenvalues using the filtered ESPRIT method: This method corresponds to Algorithm 2 in combination with an additional filtering step. This additional step is required since in principle the number of components in the signal S is not known a priori in the current setting. Therefore, we construct the matrixŨ S (in Algorithm 2) by taking the first S columns ofŨ , where S is now the number of singular values in the SVD of the Hankel matrix H(y) that exceed TF σ max . TF denotes what we call a truncation factor, and σ max denotes the largest singular value of H(y). In this way, the number of components in the signal emerges from the analysis of its Hankel matrix, rather than being a quantity that is known beforehand. By implementing the remainder of Algorithm 2 as usual, we obtain estimates of the z j 's. From these estimates of the z j 's, we obtain the spectral estimatesẼ j for the quantum algorithm and for the Monte Carlo algorithm.
Note that this approach of including a filtering step -which often resembles more closely the practically encountered scenario when running the algorithms from Lemmas 2.1 and 2.3 -differs from that considered in Theorems 3.1 and 3.2, where the number of components S in signals is known beforehand. Here, S is a quantity emerging in the analysis and it can even generally occur that components of the signal with very small coefficients -corresponding to eigenstates with very small overlap with |Φ -are filtered out.
In the results presented in this section, note that the real-time and imaginary-time increments have been chosen such that all E j that are present in the signals lie in [0, 2π). This does not mean that the whole spectrum of the Hamiltonian lies in [0, 2π), as the majority of its eigenvalues will not be present in the signals.
We note that for the quantum algorithm, the parameters {z j } have unit norm. However, due to finite sampling, one determines a noisy version of the signal g R (k), resulting in estimated eigenvalues of the Trotterized unitary having norms that slightly deviate from unity. To ensure that the estimatesẼ j are real-valued, we take them to be the real parts of i log(z j ).
The code that is used to obtain the numerical results presented in this work can be found at [39].
In Figure 3, the Monte Carlo signals Φ| e −kH |Φ and the real and imaginary parts of the quantum algorithm signals Φ| e −ikH |Φ for |Φ = |+ ⊗n and |Φ optimal are depicted.
The upper three figures correspond to |Φ = |+ ⊗n . For this choice of |Φ , the signals are clearly dominated by a single eigenvalue (the ground state eigenvalue): The Monte Carlo signal decays with a single decay rate and the quantum algorithm signals oscillate with a single frequency. For the quantum algorithm signals, there are also higher-frequency components visible (due to |+ ⊗n not having overlap with only the ground state). For the lower three figures, we take |Φ = |Φ optimal . For this choice of |Φ , there are two eigenvalues present in the signals (the ground state and first excited state eigenvalues). For the Monte Carlo signal, the excited state eigenvalue can be seen to die out within a few units of time, after which only the ground state component is left. The quantum algorithm signals can be seen to be composed of a high-frequency (excited-state) component superposed on the ground-state component, where the excited-state component now obviously does not die out.
We now consider the spectral estimates that are obtained by applying ESPRIT to the evolution signals that are produced by the quantum algorithm (from Theorem 1.1) and Monte Carlo algorithm (from Theorem 1.2). In particular, we determine both time evolution signals at a given total number of measurement points in real/imaginary time. We then determine the spectral estimates from both signals for increasing K, by including step-by-step more of the total number of measurement points in the analysis + . The truncation factor TF is taken to be equal to 0.02 throughout. The top two plots in Figure 4 depict, for a given |Σ|, the eigenvalue estimates as a function of g and for several values of K. For both the quantum algorithm and Monte Carlo algorithm estimates, it is clear that a smaller spectral gap indeed requires a larger K for the eigenvalues to be obtained accurately. Furthermore, for a given |Σ| and K, it is clear that the error of the estimate for the excited-state eigenvalue obtained from the imaginary-time signal is larger than that obtained from the real-time signal. We conclude furthermore that, in line with Theorems 3.1 and 3.2, increasing K beyond a certain threshold does not necessarily reduce the error of the eigenvalue estimates.
It is apparent that as one approaches the g = 1 point, more higher-lying eigenvalues emerge from the ESPRIT analysis. This is especially true for the quantum algorithm (note that for the Monte Carlo signal, the larger the eigenvalues are, the quicker the associated components in the signal die out). The appearance of these higher-lying eigenvalues can be attributed to the fact that (for finite n) the state |Φ optimal starts to have significant overlap with states other than the two lowest-energy eigenstates in this regime.
The middle two and bottom two plots in Figure 4 depict the relative error of the spectral estimates -i.e. |Ẽ j − E j |/E j -for resp. the ground-state eigenvalue and excited-state eigenvalue (at fixed g = 4). We consider a range of values for |Σ|. For the ground-state eigenvalue, the scaling of the relative errors as a function of |Σ| is similar for the quantum algorithm and the Monte Carlo algorithm. Clearly, the relative errors of the excited-state eigenvalue estimates for the quantum algorithm are smaller than those for the Monte Carlo algorithm.
We have also implemented the matrix pencil method in [19,16] to estimate the eigenvalues from the real-time and imaginary-time signals. The only significant difference that was found between the estimates   and as a function of |Σ|. The truncation factor is taken to be TF = 0.02 throughout. The scaling of the error of the ground-state eigenvalue estimates is similar for both methods, while the error for excited-state eigenvalue is larger for the MC algorithm than for the quantum algorithm. The excited-state eigenvalue estimates also converge more quickly as a function of K for the quantum algorithm.
obtained through the ESPRIT method and through this matrix pencil method is that -in the (K < 2S)regime -the matrix pencil method outputs estimates which resemble an average of the eigenvalues in the signal (as can be seen in Figure 2 in a noiseless setting), while this is not the case generally for the ESPRIT method.

Discussion
We have considered the problem of obtaining (some) eigenvalues of local stoquastic -i.e. sign-problem-free -Hamiltonians and general local Hamiltonians H by means of tracking the evolution of the system state, differentiating between the evolution of the system state in real time and imaginary time. In both cases, we examine the use of the matrix pencil ESPRIT method in extracting eigenvalues of H from the state evolution signal. The real-time (oscillating) evolution signal is obtained through running quantum circuits, while the imaginary-time (decaying) signal for local stoquastic Hamiltonians is obtained through a Monte Carlo scheme (developed in this work) that is implemented in a computationally tractable manner classically. Another type of decaying evolution signal -from which the ESPRIT method can extract eigenvalues of H -is obtained through a classical method for general local Hamiltonians that is similar in spirit to 'dequantization'. We have invoked some known performance bounds of the ESPRIT method for the real-time signal and applied and extended bounds for the imaginary-time signal. Our bounds suggest that the ESPRIT method (or matrix pencil methods more generally) performs -not surprisingly -worse in extracting (multiple) eigenvalues from an imaginary-time decaying (MC algorithm) signal than from a real-time oscillating (quantum algorithm) signal in the presence of noise. However, we show that if the input state contains S = O(1) eigenstates and the spectral gap is at least 1/poly(n), and the right access to the input state is available, the associated eigenvalues can be resolved efficiently (with poly(n) classical effort) for local stoquastic as well as for general local Hamiltonians. Even though for S = O(1), the classical effort for stoquastic as well as general Hamiltonians is poly(n), the 'brute-force' algorithm for general Hamiltonians (in Lemma 2.4) incurs an exponential cost in k in estimating the signal g D (k), while for stoquastic Hamiltonians the cost is polynomial in k. Despite this difference in cost, the error bounds for the eigenvalue estimates obtained here through analysis of the ESPRIT method applied to a decaying signal (g D (k) or g I (k)) suggests that letting k grow as some function of n will generally not help.
Even though our results show that for these Hamiltonians, for an input state supported on S = O(1) eigenvalues (separated by an at least 1/poly(n) gap), these eigenvalues can be estimated with poly(n) classical effort, it remains to be better understood how practical this MC method for stoquastic Hamiltonians or the 'dequantization' method in Lemma 2.4 are. The upper bounds for the errors on the eigenvalue estimates in Theorem 3.2 grow rather fast with S (and the computational effort grows fast with k in Lemma 2.4 for general local Hamiltonians), and it is not clear how much one can improve, say, the ESPRIT bounds.
Indeed, it would be interesting to show that the current bounds of ESPRIT for the imaginary-time decaying signal cannot be improved upon. There are definitely known negative results on the condition number of Vandermonde matrices [32], but there might be signal extraction algorithms that have better practical performance on decaying signals, or have looser requirements (such as the requirement that all data is evenly spaced). However, we suspect that the difficulty gap we observe between real-time and imaginarytime signal is universal. One possible way to argue this is through the Cramer-Rao bound (which has been analysed for real-time signals [38] but not for imaginary-time signals), which is a question we leave for further research.
In terms of numerical results, we find that: For a given spectral gap and sample size, the ability to distinguish between two eigenvalues indeed depends on the number of measurement points K at which the real-time and imaginary-time evolution signals are evaluated. The MC algorithm for stoquastic Hamiltonians and the quantum algorithm (in combination with the ESPRIT method) lead to a similar scaling of the relative error of the ground-state eigenvalue as a function of the sample size. However, for an excited-state eigenvalue, the quantum algorithm leads to significantly smaller relative errors than the MC algorithm. More extensive numerical studies, also of models other than the transverse-field Ising chain, may shed further light on whether the Monte Carlo + ESPRIT method is useful in practice. For frustrated stoquastic Hamiltonians, even the smallest eigenvalue may lead to a fast decaying signal, requiring small sampling error and Trotter error in practice.
As for other directions of further research, one can ask whether a hybrid approach in which imaginarytime data from an error-free Monte Carlo algorithm can strengthen the use of real-time data from a quantum algorithm obtained from a noisy quantum circuit. This approach requires combining the data where the poles/nodes z j = e −iEj on the unit circle each have a partner pole z j = e −Ej (or z j = I − E j /2π) on the real axis. If the effect of noise can be modeled z j = e −iEj → e iEj −γ [30], then the imaginary-time data may help in extracting the values for E j . It may also be of interest to consider the case of sampling k for both the quantum circuit and Monte Carlo method at random (instead of picking k = 0, 1, . . . , K). Another direction of further research is the following. Suppose the input state has overlap with S (here not necessarily O(1)) eigenstates of the Hamiltonian, one could asses how well the ESPRIT methods succeeds in extracting e.g. the ground-state eigenvalue by filtering out all other components in the real-time or imaginary-time evolution signals. This work is supported by QuTech NWO funding 2020-2024 -Part I "Fundamental Research", project number 601.QT.001-1, financed by the Dutch Research Council (NWO). JH is supported by the Quantum Software Consortium (NWO Gravitation Grant, project number 024.003.037). MS and BMT developed the MC method based on unpublished results of Sergey Bravyi, MS implemented the numerics on the transverse field Ising model, JH performed the analysis of the ESPRIT algorithm for decaying signals, BMT supervised the whole project and all authors contributed to the writing. We thank Ingo Roth for pointing out the use of median-ofmeans estimators for observables whose higher-order moments cannot be upper bounded by a constant. We thank Sergey Bravyi for pointing out [12]. O(poly(n))) represents a k-local Hamiltonian of a quantum system.

Appendix A. Trotterization
is generally a set of non-commuting terms but can be divided into subsets, such that within each subset all terms commute. For a given set {H i } N i=1 , we denote the minimum possible number of these subsets by Γ. This number of subsets is at most N and equals 1 in the trivial case where all H i 's commute with each other. The Hamiltonian H can thus be decomposed as H = Γ γ=1 H γ , where all H γ do not commute with each other, but the terms of which each individual H γ is composed do commute. Choosing a decomposition into the minimum number of subsets brings about an additional advantage of parallelizability when implementing the evolution of the systems in imaginary or real time.
The following Lemma (adaptation from [7]) upper bounds the errors of implementing imaginary-time and real-time state evolution through a first-order Trotter decomposition.
To obtain a better scaling of the errors as a function of the Trotter variable M , one can employ higherorder Trotter decompositions. We denote the pth-order approximants of e −itH/M and e −τ H/M by T M (p, t) and T M (p, τ ), respectively. We denote Φ| e −itH |Φ − Φ| T M (p, t) M |Φ and Φ| e −τ H |Φ − Φ| T M (p, τ ) M |Φ by trot . In [7], it was shown that, for general p, trot is upper bounded as follows: ..] (α 1/p is typically poly(n)) and Eq. (A.2b) holds provided that 4τ Υ γ H γ /M ≤ 1 (where Υ corresponds to the number of stages of the Trotter decomposition and typically scales exponentially in p) * . In [40], a widely used scheme is discussed for constructing pth-order approximants.
It is important to consider the total number of k-local propagation operators L required to simulate e −itH and e −τ H (for a given order p and Trotter variable M ). For the scheme in [40], the number of these * In the remainder of this discussion it is assumed that this condition is satisfied. In Figure A1, we have depicted the absolute error of noisy MC (imaginary-time) and QPE (real-time) signals at fixed τ = t as a function of M , obtained through first-order N -term and Γ(= 2)-term Trotterization schemes. We have included the first-order Trotter error bounds. We note that the apparent drastic increase in noise magnitude as a function of M is primarily due to the fact that the absolute error decreases as a function of M and is plotted on a logarithmic scale.
In this Appendix we prove the following Lemma, extending Lemma 2.3: |x is a normalized state of n qubits where Φ(x) ∈ C (∀x) and x Φ(x) 2 = 1. We assume that (1) Φ(y) Φ(x) can be efficiently (poly(n)) calculated for a given x and y and (2) we can efficiently draw samples from the probability distribution P (x) = Φ(x) 2 .
Proof. In addition to the n-qubit register, we exploit a single ancillary qubit. The matrices G l are still elementwise non-negative. The state |a denotes the state of the single ancillary qubit. By making use of the single ancillary qubit, the propagation operators can be symmetrized as follows: In this form, F l (the 'new' propagation operator) is element-wise non-negative, k + 1-local and Hermitian and hence one can apply Lemma 2.3 to Φ| F 1 F 2 . . . F L |Φ , provided that its eigenvalues lie in (0, 1]. The eigenvalues λ of F l (for l odd) can be found by solving: where we have used that G What is left to prove is that estimating the signal for the string of F l 's is equivalent to estimating the signal for the string of G l 's. Specifically, we want to prove the following identity: G 1 G 2 ...G L = 0| F 1 F 2 ...F L |L mod 2 , for L ∈ Z + . This is done below by means of induction.
• For L = 1: ..F L |L mod 2 holds for L, it holds for L + 1 as well: Making use of the definition in Eq. (B.1), we write F L+1 as follows: The quantity of interest -in the case of the length of the operator string being L + 1 -can now be rewritten as follows: which finishes the proof.

Appendix C. Median-of-means estimator
The MC scheme described in Section 2 produces a set of |Σ| samples {x} which are distributed according to Π(x). For each sample, Re(R(x)) can be evaluated and subsequently an estimate of F can be obtained. Only the first and second moments of the random variable Re(R(x)) can be upper bounded in general. Therefore, if one would use the empirical mean Re(F) = 1 |Σ| x∈Σ Re(R(x)) as a mean estimator for F, then the best achievable scaling of |Σ| such that is |Σ| = Θ( −2 δ −1 ) (by means of Chebyshev's inequality).
Taking the median-of-means estimator [26] as estimator (instead of the empirical mean), one can obtain a more convenient scaling of |Σ| w.r.t. δ (despite the fact that only the first two moments of Re(R(x)) can be upper bounded). The median-of-means estimator can be constructed as follows: Partition the set of MC samples Σ into q groups s 1 , . . . , s q of size approximately |Σ|/q. One then computes the empirical mean of Re(R(x)) over the samples in each group separately (giving q unbiased estimators of F) and takes the median of these empirical means. We denote the empirical mean for each group by f j = 1 |sj | x∈sj Re(R(x)) (for j ∈ {1, . . . , q}) and denote the median of these empirical means byF = M(f 1 , . . . , f q ). The estimatorF is the median-of-means estimator.
We define the median of q real numbers a 1 , . . . , a q as M(a 1 , . . . , a q ) = a i with a i such that where we take the smallest i if multiple is obey this condition. {Re(R(x))} are i.i.d. random variables with mean F and variance Var Re(R(x)) ≤ 1. Let q and |Σ|/q be positive integers, then So for q = 8 log(δ −1 ) and |Σ| = 4 q −2 = 32 log(δ −1 ) −2 , we have: Note that the estimatorF = M(f 1 , . . . , f q ) depends explicitly on the confidence since q scales with δ. Given that indeed q = Θ log(δ −1 ) , the number of samples required to obtain Eq. (C.4) is |Σ| = Θ(log(δ −1 ) −2 ) (which is an exponentially better scaling w.r.t. δ compared to that for the empirical mean estimator). To see why Eq. (C.3) is true , see [26], note that one can apply Chebyshev's inequality to each of the empirical means f j : with probability at least 3/4, we have f j − F ≤ 4q/|Σ|. If F − F ≥ 4q/|Σ|, then, by definition ofF, at least q/2 of the empirical means f j satisfy f j − F ≥ 4q/|Σ|. Hence the probability that F − F ≥ 4q/|Σ| is upper bounded by the probability that a binomially distributed random variable with q draws and success probability 1/4 exceeds q/2: where we have used E Bin(q, 1/4) = q/4 and Hoeffding's inequality.

Appendix D. Performance of ESPRIT on the imaginary-time (decaying) signal
In this section we prove a series of Lemmas that characterize the behaviour of the ESPRIT algorithm (Algorithm 2) on an imaginary-time signal obtained with finite error. They are direct generalisations of the work done in [23], which leads up to Theorem 3.1 for oscillatory signals, to signals composed of real exponential decays. We will see that the guarantees on the algorithm will be substantially weaker in this case. The end goal of this section is Theorem 3.2 in the main text.
The argument decomposes roughly into two halves. In the first half we argue that the behaviour of ES-PRIT is controlled by the smallest non-zero singular value of the Vandermonde matrix V L . In the second half we argue that that this smallest nonzero singular value can be controlled in terms of a gap condition on the energy eigenvalues of the imaginary-time signal.
We start by proving a short result on the smallest nonzero singular values of products of matrices.
Lemma D.1. Let the smallest nonzero singular value of a matrix X be σ min (X). For any matrix, X we have σ min (X) := X + −1 , where X + is the Moore-Penrose pseudo-inverse of X, i.e. through the SVD, we have σ −1 min (X) = X + , where X is the operator norm (the largest singular value). Let A, B be (non-square) matrices such that (AB) + = B + A + . Then we have that σ min (AB) ≥ σ min (A)σ min (B). (D.1) Proof. By sub-multiplicativity of the operator norm, we have that We note that the product property on the Moore-Penrose pseudo-inverse does not hold for all matrices (unlike for the regular inverse). We will make use of the following sufficient condition: 13]). Let A, B be matrices and let A have full column rank, and B have full row rank. Then we have (AB) + = B + A + .
Next, we argue that a small perturbation in the imaginary-time signal does not impact the space spanned by the the first S left singular vectors of the Hankel matrix H(g) too strongly, see the ESPRIT Algorithm 2. It is a compressed version of Lemmas 4 and 5 in [23] (which are formulated for real-time signals only, but hold more generally). To state this Lemma we need to consider a freedom of choice in U S andŨ S withŨ S as defined in the ESPRIT Algorithm 2 and U S its noise-free version. It is possible that U S andŨ S are far apart as operators, even if the spaces they span are close together.
We solve this by not considering U S proper, but rather a rotated version of U S . As we will see, this rotation will not impact the actual output of ESPRIT which are the eigenvalues of the signal matrixΨ. The rotated version of U S is given through the S × S unitary operator (O 2 O 1 ) † , which is defined via the singular value decomposition of U † SŨ S , i.e.
with I S ≥ D ≥ 0 and D diagonal. The diagonal elements of the matrix D are cosines of the so-called canonical angles. We note that this internal rotation is performed implicitly in [23], whereas we have chosen to make it explicit at all times.
Lemma D.3. Let (g +η)(k) be an imaginary-time signal with g(k) = S i=1 c i z k i and η(k) a small noise vector. Consider the associated Hankel matrices H(g) and H(g +η), with singular value decompositions H(g) = U ΣW and H(g + η) =ŨΣW , and label the matrix of the first S columns of U (resp.Ũ ) as U S (resp.Ũ S ). Finally, . (D.5) Proof. First, we can observe that indeed The proof follows from Wedin's sin Θ theorem for perturbations of singular subspaces as well as Weyl's perturbation theorem for singular values, see e.g. [3,37]. From this latter theorem we know that |σ i (H(g + η)) − σ i (H(g))| ≤ H(η) ≤ σ min (H(g))/2 where σ i is the ith singular value (in order and some singular values can be zero). Let σ min (H(g + η)) > 0 be the kth singular value, and thus σ min (H(g + η)) ≥ σ k (H(g)) − σ min (H(g))/2 ≥ σ min (H(g))/2, (D. 6) where the last inequality holds as σ k (H(g)) > 0 and hence is at least σ min (H(g)). Hence we can use Wedin's theorem on singular values (Theorem 3.4 in [22], setting δ = α = σ min (H(g))/2) to conclude that where U ⊥ S is the matrix formed from the L + 1 − S other (besides U S ) columns of the noiseless U . To connect this to U S (O 2 O 1 ) † −Ũ S we can make the following long calculation: In the second inequality we used that tr U S U † S = tr Ũ SŨ † S = S since U S as well asŨ S consist of S orthonormal columns, and D ≥ D 2 = DD † , since I ≥ D ≥ 0. In addition, at the end we use that The next step is to bound the deviation of the ESPRIT signal matrixΨ =Ũ + 0Ũ 1 from the rotated version of its noiseless variant ( Recall that U 0 (resp. U 1 ) are constructed by removing respectively the first or last row from the matrix U S . Note also that only the eigenvalues of the signal matrix Ψ matter in the ESPRIT Algorithm 2 and the additional unitary rotations O 2 O 1 do not alter these eigenvalues. We first establish some intermediate result: Lemma D.4. Let A, B be matrices such that rank(A) = rank(B). If A − B ≤ σ min (A)/2 then .

(D.9)
Proof. From Theorem 4.1 in [43] we get that leading to the first inequality in Eq. (D.9) and the Lemma follows.
The next Lemma establishes that if a (non-square) matrix is full rank, a sufficiently small perturbation does not decrease the rank (and hence rank is preserved). Note that full-rankness is really required, as an arbitrarily small perturbation can always increase the rank. Proof. By construction, we have rank(A) ≥ rank(B). Moreover we have that the smallest singular value of B is at least σ min (A) − A − B , by Weyl's singular value perturbation theorem [37]. Thus by construction the smallest singular value of B is at least σ min (A)/2 which is strictly larger than 0 as A is full rank and thus B is also full rank, and thus rank(A) = rank(B).
Finally, we will require a bound on the smallest nonzero singular value of U 0 . This is the only Lemma where we deviate significantly from the work done in [23], where the corresponding result, Lemma 3 in [23], makes explicit use of the fact that in their scenario all poles z j lie on the unit circle (what we call the real-time, oscillatory signal). The bound we present here is simpler and more general and thus applies to both imaginary (decaying) as well as real-time (oscillatory) signals, but leads to a suboptimal dependence on the condition number of the Vandermonde matrix V L defined in Eq. (28) (in particular σ min (V L )). However, it is sufficient for our purpose. The Lemma will use some essential properties of how the ESPRIT Algorithm 2 works which we review first. Key to the functioning of ESPRIT is the fact that H(g) has two decompositions where V L is the (L + 1) × S Vandermonde matrix defined in Eq. (28) and the coefficient matrix C is given in Eq. (27). When S ≤ L ≤ K + 1 − S (requiring K + 1 ≥ 2S), V L and V K−L are full rank. Then V L and U S have an image of the same dimension, which means there exists an invertible matrix A such that and thus with Z = diag(z 1 , . . . , z S ). This implies that and hence the eigenvalues of Ψ are the poles z i .
Lemma D.6. Let U 0 be the L × S matrix obtained from U S by removing the last row. If the associated Vandermonde matrix V L−1 is of (full) rank S then so is U 0 , and moreover Proof. We have 17) which means that the singular values of A are precisely inverse to those of V L , or equivalently that A + has the same singular spectrum as V L . Moreover, by assumption V L−1 has full column rank, and A is invertible so which is the inverse of the Lemma statement.
With Lemmas D.4, D.5 and D.6 in hand, we can give a perturbation bound for the signal matrixΨ. We will show that ifŨ 0 does not deviate strongly from the rotated version of U 0 , namely U 0 (O 2 O 1 ) † , then the noisy signal matrix is also close to the ideal (rotated) version.
Lemma D.7. Let Ψ := U + 0 U 1 ,Ψ =Ũ + 0Ũ 1 be the ideal and perturbed version of the signal matrix, respectively. Now assume that . With this assumption we have .
. Now note that by our initial assumption and . This means that we can use Lemma D.4 to conclude that .

(D.21)
Hence we get where we used σ min (U 0 ) ≤ U 0 ≤ 1. Plugging in the lower bound on σ min (U 0 ) (Lemma D.6) and noting that Combining all of this we get the following general theorem. From now on we restrict ourselves to the case L = K/2: Theorem D.8. Let (g + η)(k) be the signal with g(k) = S i=1 c i z k i and η(k) a small noise vector to which we apply the ESPRIT algorithm 2. Consider then the associated Hankel matrices H(g) and H(g + η), with singular value decompositions H(g) = U ΣW and H(g +η) =ŨΣW , and label the matrix of the first S columns of U (resp.Ũ ) as U S (resp. asŨ S ). Denote by U 0 (resp. U 1 ) the submatrix of U S with the last row (resp. first row) removed and define the signal matrix Ψ = U + 0 U 1 (similarly forΨ). Let L = K/2 and K + 1 ≥ 2S.
where O 2 O 1 is defined through the singular value decomposition of U † SŨ S , Eq. (D.3).
Proof. We start from the requirement in Lemma D. where we also used the general fact about Vandermonde matrices [2, theorem 1] that σ min (V K/2 ) ≥ σ min (V K/2−1 ) (i.e. the smallest non-zero singular value of V L grows monotonically with L).
We wish to translate the bound in Theorem D.8 to a theorem on the distance between the inferred eigenvalues z i andz i . The argument is as follows. We know from the Bauer-Fike theorem and the fact that We have A = V + K/2 U S and since U S is an isometry we know that and hence we get a bound on the matching distance of the eigenvalues in terms of known quantities, as expressed in the following Theorem: Theorem D.9. Let y(k) = (g + η)(k) (k = 0, . . . , K) be the signal with g(k) = This theorem thus holds for both the decaying as well as oscillatory signal.
In order to use the Theorem, one has to lower bound σ min (V L−1 ) (and more trivially, upper bound V L as well) for L = K/2. For complex poles z i on the unit circle one can get very good bounds, assuming a gap, see Eq. (34).
To lower bound σ min (V L−1 ) for a purely decaying signal, we start with the following characterization of square Vandermonde matrices with real poles due to Gautschi: Theorem D.10 (Theorem 1 in [11]). Let V S−1 be a square S × S Vandermonde matrix with S (unequal) real positive poles z 1 , . . . , z S . Then ∞ norm of V −1 S−1 is using a geometric series. Hence V + ST −1 ∞ can be calculated to be max i∈{1,...,S} which gives the Lemma statement.