Fighting Noise with Noise: A Stochastic Projective Quantum Eigensolver

In the current noisy intermediate scale quantum era of quantum computation, available hardware is severely limited by both qubit count and noise levels, precluding the application of many current hybrid quantum-classical algorithms to nontrivial quantum chemistry problems. In this paper we propose applying some of the fundamental ideas of conventional Quantum Monte Carlo algorithms—stochastic sampling of both the wave function and the Hamiltonian—to quantum algorithms in order to significantly decrease quantum resource costs. In the context of an imaginary-time propagation based projective quantum eigensolver, we present a novel approach to estimating physical observables which can lead to an order of magnitude reduction in the required sampling of the quantum state to converge the ground state energy of a system relative to current state-of-the-art eigensolvers. The method can be equally applied to excited-state calculations and, combined with stochastic approximations of the system Hamiltonian, provides a promising near-term approach to Hamiltonian simulation for general chemistry on quantum devices.


I. INTRODUCTION
Electronic structure problems are hard for conventional computers, due to the exponential scaling of the Hilbert space with the physical size of the studied system.Quantum computers have been heralded as a solution to this problem, [1,2] mainly due to their ability to use the quantum entanglement of their constituent qubits to encode even highly complex electronic wavefunctions in a linearly scaling qubit register.This promise has led to the development of a variety of quantum algorithms with applications in quantum chemistry, such as Hamiltonian simulation [3][4][5][6] and Quantum Phase Estimation.[7,8] However, while these techniques promise general solutions to many problems in quantum chemistry, they require many-qubit, fully fault-tolerant quantum computers to run.Currently we are in an era of Noisy Intermediate Scale Quantum (NISQ) hardware, [9] which is characterised by moderate numbers (10 -1000) of qubits prone to high error rates due to short coherence times and noisy gate implementations.[10] While significant work is being carried out in the fields of quantum error correction [11][12][13][14][15] and mitigation, [16][17][18][19] the fully fault-tolerant quantum computer remains a thing of the future.
In order to make the most of currently available quantum resources, hybrid quantum-classical algorithms [20][21][22][23][24][25][26] have been developed, which shift most of the computational overhead onto a classical processor, while requiring specific, highly efficient operations to be carried out by the quantum processor.The most well known of these is the variational quantum eigensolver (VQE), [21] in which some parametrised wavefunction is encoded on a quantum device and used to compute a cost functionusually the expectation value of a Hamiltonian operator.This is used in the classical optimisation of the parameters.In quantum chemistry, this approach has been used * maf63@cam.ac.uk to calculate ground-state energies of molecules and model systems.[21,27,28] Since methods like VQE require estimating expectation values of quantum observables, despite making use of only moderately sized quantum circuits they require a large number of repeated quantum measurements to obtain sufficiently accurate estimators.This is due to the intrinsic noise associated with the measurement of a quantum observable.Significant work has gone into devising more efficient measurement schemes, [29][30][31][32] which obtain lower variances for the same number of measurements.
In this paper, we approach the issue of decreasing quantum computational overhead of electronic structure problems by taking inspiration from conventional Quantum Monte Carlo (QMC) algorithms, [33][34][35] which use noisy representations of the wavefunction and stochastic propagation techniques to obtain very high-accuracy results for challenging systems.In particular, we consider a projective quantum eigensolver (PQE) [36] approach and propose an algorithm to estimate the energy during the propagation, which is found to require significantly fewer samples for the same accuracy as its conventional counterpart.
In a perfectly noiseless regime, the PQE algorithm converges smoothly to the ground state of a given Ansatz.However, in practice, it processes data obtained from finitely many samples of a distribution stored in a quantum device, which is intrinsically stochastic even when ignoring NISQ-era device noise.Achieving a low enough uncertainty in this data to use it as if it were exact requires a large number of measurements.It is therefore compelling to employ instead an algorithm intended to operate on noisy measurements, such as QMC.
First, we establish a methodology to obtain accurate estimators for the ground state energy of a quantum system using relatively high-variance samples from an ongoing PQE propagation.We refer to this method as Monte Carlo PQE (MC-PQE).We then introduce additional stochastic approximations to further decrease quantum resource requirements.Stochastically rounding the wavefunction allows gates to be removed from the state preparation circuit, reducing its depth.The observables may also be estimated using a sampled subset of the Hamiltonian operator, lowering the number of independent measurements required.These techniques introduce additional noise into the algorithm, but we find this can be efficiently averaged out and, for a set of second row hydrides, show that the accuracy of the original algorithm can be recovered at significantly lower cost.Finally, we investigate the applicability of MC-PQE for excited states, using the folded-spectrum method, [21,37,38] and find that, while convergence is somewhat more challenging, it can reduce the cost of an excited state calculation to that of the ground-state method.Given that the main drawback of the folded-spectrum method is its unfavourable scaling with system size, this is a very promising alternative.
In Sec.II we present the underlying theory of quantum eigensolvers, conventional QMC methods and the folded-spectrum approach which allows these algorithms to be expanded to excited state calculations.In Sec.III we describe the quantum algorithm employed in this paper, together with circuit implementations and possible stochastic variations.Finally, in Sec.IV we apply these methods to a range of molecular systems and discuss their performance.We draw our final conclusions in Sec.V.

II. BACKGROUND THEORY A. Variational Quantum Eigensolver
One of the principal NISQ algorithms for quantum chemistry is the variational quantum eigensolver (VQE).[21] According to the variational principle, for a parametrised wavefunction Ψ(θ) that satisfies the boundary conditions of the problem, the expectation value of the energy is greater than or equal to the ground state energy of the system, Therefore, by minimising ⟨E(θ)⟩ with respect to the set of parameters θ, one can obtain the best possible approximation of a given form to the ground state wavefunction.
In VQE, the expectation value of the energy and any gradients used are computed by a quantum processor before being passed on to a classical optimisation algorithm.There are various considerations to be taken into account when choosing parametrised wavefunction Ansätze for VQE.Heuristic hardware efficient Ansätze (HEA) [39] attempt to introduce many degrees of freedom while maintaining low circuit depths.However, optimisation on the resulting landscapes is often difficult, as they are plagued by barren plateaus [40,41] -regions of parameter space in which the gradient vanishes exponentially with system size -that can prevent the optimisation algorithm from converging to a true minimum.The propensity for a problem to display barren plateaus is dependent on a variety of system and algorithm properties [42] and has been shown to be worse for more expressive Ansätze [43] and higher degrees of entanglement.[44] However, the intrinsic noise resulting from finite sampling of the quantum state can be helpful in improving optimisation outcomes.[45] Alternatively, one can use physically motivated Ansätze such as unitary coupled cluster (UCC), [46][47][48][49] in which the wavefunction is expressed as where |ϕ 0 ⟩ is typically the Hartree-Fock wavefunction, i is an index running over all determinants in the Hilbert space and âi and â † i are fermionic excitation and de-excitation operators respectively, such that âi |ϕ 0 ⟩ = ± |ϕ i ⟩ and â † i |ϕ i ⟩ = ± |ϕ 0 ⟩, with signs derived from the anti-commutation relations of creation and annihilation operators.The cluster amplitudes t i are optimised to give an approximation of the ground state wavefunction.In principle, this wavefunction form can include up to all-electron (de)excitations, but it is common to truncate it to some low excitation order, most commonly including only up to two-electron operators, which gives the unitary coupled cluster singles and doubles (UCCSD) Ansatz.This type of Ansatz is generally found to be easier to optimise than HEAs but has its own challenges.In order to express the UCC operator in the set of gates available on most quantum computers, one generally has to perform a Trotter decomposition [50,51] of the cluster operator, where τi = t i (a i − a † i ) and we have rewritten the exponential in Eq.

as exp( T − T
. Usually, rather than working in the large ρ limit, ρ is taken to be unity, leading to a disentangled form of the UCC Ansatz, which has been found to be as expressive as the original algorithm in most circumstances.[52] Because of this decomposition, UCC-based circuits often tend to be prohibitively deep for NISQ devices.Various alternative algorithms have been devised, that attempt to enforce some of the known physical properties of the wavefunction to generate easier to optimise energy functions, while maintaining a reduced circuit depth relative to UCC.Examples include symmetry preserving Ansätze, [53] adaptive Ansätze [54] and methods like qubit coupled cluster (QCC) [55,56] and Givens-rotationbased Ansätze [57] which avoid some of the complexities of fermionic excitation operators.

B. Projective Quantum Eigensolver
As an alternative to variational algorithms, one can solve electronic structure problems using projective approaches.In classical settings, this technique is most commonly employed in Projector Monte Carlo (PMC) algorithms, which will be discussed in more detail in section II C.
Consider a trial wavefunction expressed as |Ψ 0 ⟩ = Û (θ) |ϕ 0 ⟩, where |ϕ 0 ⟩ is a simple reference wavefunction such as a Hartree-Fock determinant and Û (θ) is some parametrised unitary operator.If this were an exact solution to the Schrödinger equation, then and Projecting Eq. 7 onto the reference wavefunction |ϕ 0 ⟩ gives while projecting onto the complete set of |ϕ i ⟩ orthogonal to |ϕ 0 ⟩ gives a set of quantities referred to as residuals If Û |ϕ 0 ⟩ is an exact eigenfunction of Ĥ, then substituting Eq. 7 into Eq.9 gives for any ϕ i in the Hilbert space.Therefore, the idea behind projective algorithms is to use the condition given by Eq. 10 to obtain a set of equations that can be solved for the values of the parameters in Û (θ), in order to obtain an approximation to the true ground state.This is done by enforcing Eq. 10 for a subset of states |ϕ i ⟩, corresponding to the parameters in Û (θ).Solving Eq. 8 and 10 for this subset leads to a Projective Quantum Eigensolver (PQE) [36] algorithm.If Û is a UCC cluster operator and |ϕ 0 ⟩ is the Hartree-Fock determinant, Eq. 8 and 10 can be solved using a quasi-Newton iterative scheme such as [36] θ If |ϕ i ⟩ labels |ϕ ab... ij... ⟩, with indices i, j, ... ( a, b, ...) denoting orbitals occupied (unoccupied) in the HF determinant, from (to) which electrons are excited, then ∆ i = ϵ i + ϵ j + ... − ϵ a − ϵ b − ..., where ϵ i are Hartree-Fock orbital energies and the indices correspond to the same orbitals as above.This algorithm is appealing for NISQ device use because, unlike energy gradients calculated in many VQE algorithms, the residuals r l i can be computed as expectation values at the same cost as the overall energy, since where As mentioned above, the main use of projective methods in conventional quantum chemistry is in PMC algorithms, which are based on the imaginary-time Schrödinger equation, where β is imaginary time and Ĥ is the Hamiltonian of the system of interest.The lowest-energy solution to Eq. 13, |Ψ 0 ⟩, is given by where |Ψ(0)⟩ is some initial trial wavefunction such that ⟨Ψ 0 |Ψ(0)⟩ ̸ = 0 and E 0 is the lowest eigenvalue of Ĥ.
The time-evolution operator can also undergo a Trotter expansion, giving where ∆β = β/n.For small ∆β, the exponential can be approximated by a linear expansion, leading to the form of the imaginary time propagator most commonly used in projector methods, Considering a single time-step ∆β, one can write which describes the step-wise imaginary time evolution of the trial wavefunction |Ψ⟩.As in PQE, this equation can be projected onto the various Slater determinants in the Hilbert space to give In the original Full Configuration Interaction Quantum Monte Carlo (FCIQMC) algorithm, [33] the wavefunction is parametrised as a linear expansion in all determinants in the Hilbert space, |Ψ⟩ = i c i |ϕ i ⟩.Eq. 18 can then be approximated as or where We note that this is equivalent to where In FCIQMC, Eq. 20 is interpreted as governing the population dynamics of a set of particles ("walkers", "psips" or "excips") living in the Hilbert space, such that on each determinant there exists a particle population proportional to its configuration interaction (CI) coefficient, c i .Rather than iterating these equations exactly, their terms are sampled stochastically and can be described by three processes: [33] • Spawning of particles from determinant |ϕ j ⟩ to |ϕ i ⟩ corresponds to the off-diagonal action of the Hamiltonian.
where p select (ϕ j ) is the probability of selecting ϕ j as the source of the spawn, p gen (i|j) is the conditional probability of selecting ϕ i as the target of a spawn given it originates from ϕ j and n attempts is the number of spawning attempts.k is the index of the current attempt and s denotes that this is a spawning event.
• Death or cloning from |ϕ i ⟩ to itself corresponds to the diagonal action of the Hamiltonian.
where d denotes that this is now a death event.
• Annihilation ensures that the two equivalent oppositely signed versions of the wavefunction do not proliferate simultaneously and corresponds to collecting all contributions to c i into a single value.
We note that in the death step the unknown true ground state energy E 0 has been replaced by the shift S.This acts as a population control parameter and in standard QMC procedures eventually converges to a stochastic estimator of E 0 .[33] The projected energy is normally used as a second way to estimate E 0 .By employing a stochastic representation of the wavefunction, algorithms like FCIQMC take advantage of the sparsity of the Hamiltonian, and are therefore able to tackle problems far beyond those tractable for a conventional FCI solvers.[60] This behaviour can be further enhanced by developments such as the initiator approximation [61] and the adaptive-shift method, [62,63] which artificially stochastically increase the sparsity of the Hamiltonian.
Alternative projective Monte Carlo formulations have been developed using Coupled Cluster (CC) [34] and UCC [35] parametrisations of the wavefunction.These are based on the fact that for an exponential Ansatz, if a determinant |ϕ i ⟩ is reachable by an excitation included in the cluster operator, then ⟨ϕ i |Ψ⟩ = t i + O(t 2 ), where t i are the cluster amplitudes.Since these methods are only valid in the regime in which the cluster amplitudes are small, a similar equation to Eq. 20 can be written down as Therefore, similar stochastic propagation techinques can be used as in the case of FCIQMC, however one only has access to the cluster amplitudes t i .The corresponding CI coefficients must also be sampled stochastically, leading to additional complexity in the selection routine, which is responsible for a significant portion of the computational cost of these algorithms.
In FCIQMC it is easy to cycle through all determinants present in the wavefunction or sample them uniformly.In contrast, in Coupled Cluster Monte Carlo (CCMC) a doubly excited determinant |ϕ i ⟩ = |ϕ ab ij ⟩ will have a CI coefficient given by where t a i , t ab ij are the amplitudes of the corresponding single and double excitation operators âa i and âab ij in the cluster operator T .Therefore, when selecting a determinant for spawning or death, one must consider both composite (e.g.t a i t b j ) and non-composite (t ab ij ) contributions ("clusters") to it.The standard way to do this in CCMC [34,64] is to: 1. Select a cluster size s with some probability p s ; 2. (Optionally) select a particular type of cluster of size s by imposing constraints on the excitation levels of the individual excitors involved; [64] |0⟩ H H FIG. 1: Quantum circuits for measuring the overlap si = ⟨ϕ i | Û ϕ0⟩.The first is a modified Hadamard test, [58] which uses a single ancilla qubit and a controlled version of the unitary Û .The second was proposed by Huggins et al. [59] and needs (n qubits + 1) ancilla but no controlled Û .
3. Select s excitors e to form the cluster, each with probability p e ∝ t e ; 4. "Collapse" the excitors to determine the corresponding determinant (e.g.âa A variety of selection algorithms have been developed for CCMC to better importance sample the underlying wavefunction.[64] Nevertheless, this remains one of its main challenges, particularly when expanded to high truncation levels [65] or multi-reference formulations.[66] To further complicate matters, in unitary CCMC (UCCMC) [35] the maximum cluster size is formally infinite, due to alternating excitation and de-excitation operators in the clusters.While results converge relatively quickly with maximum cluster size, the inclusion of large composite clusters leads to instabilities in the Monte Carlo propagation.

D. Folded-spectrum
Methods like VQE, PQE and standard PMC are all intended to find the ground state of a system of interest.The energy of some excited states can be found with these techniques by enforcing orthogonality between the state of interest and the ground state, for example by considering different symmetry sectors of the Hilbert space.[24] However, obtaining general excited states is not-trivial.
Using the FCIQMC approach, it is possible to obtain multiple excited states simultaneously by instantaneously orthogonalising the stochastic wavefunction representations of multiple independent FCIQMC calculations.[67] A similar idea forms the basis for the Variational Quantum Deflation (VQD) [23] algorithm, although this requires sequential optimisation of the different excited states.Alternative hybrid excited state algorithms include the quantum subsbace expansion (QSE) method, [22,68,69] in which the Hamiltonian is diagonalised in some subset of the Hilbert space to obtain estimates of low-lying eigenstates; the multistatecontracted VQE method, [70] which optimises an entanglement matrix between some approximate excited states obtained from a conventional quantum chemistry calculation; or the witness-assisted variational eigenspectra solver (WAVES), [71] which uses the von Neumann entropy of states to ascertain whether they are eigenstates of the Hamiltonian.
Another approach to directly target excited states is the folded-spectrum method, [21,37,38] in which the system Hamiltonian Ĥ is replaced by which has the same eigenstates as the original Ĥ.However, the eigenvalues become (E i − ω) 2 so the ground state of Ĥfs (ω) becomes the eigenstate with true energy E i closest to ω.Therefore, by changing ω one can target any stationary state of the system with conventional variational or projective techniques.This method has been previously used in conjunction with FCIQMC [72] and similar approaches have been used in the conventional Variational Monte Carlo (VMC) community under the name of state-specific variance optimisation.[73][74][75][76][77] In variational folded-spectrum approaches, one minimises the quantity while in projective approaches one updates the parameters using the residuals The algorithms employed are equivalent to the standard ground-state approaches, but the folded-spectrum Hamiltonian is more complex than the original Hamiltonian.Consider a standard chemistry Hamiltonian where p and p † are fermionic creation and annihilation operators respectively for spin-orbital p and f pq and g pqrs are the one-and two-body contributions to the Hamiltonian.The corresponding folded-spectrum Hamiltonian will include up to four-body terms, which will be significantly more expensive to compute than the one-and two-body terms of the original Hamiltonian.The total number of terms in the operator will therefore scale as N 8 in the number of one-electron basis functions, rather than the N 4 scaling of the original Hamiltonian.Due to this significant cost increase, the folded-spectrum technique is currently limited to only small systems.

III. QUANTUM ALGORITHMS A. Unlinked Residual Measurement Implementation
We begin by discussing the implementation of quantum circuits to compute the unlinked residual and the overlap required by the QMC methods discussed in Sec.II C. As detailed in Sec.II C, exponential Ansatz-based QMC algorithms require complex selection algorithms to estimate r u i , as computing it directly would require storing the full CI wavefunction, which is not memoryefficient.For large cluster sizes, this often becomes the limiting factor on the speed and rate of convergence of a calculation.Therefore, using a quantum device to compute these terms without the memory overhead would be a significant boon to these methods.This has been shown in the context of auxilliary-field QMC (AFQMC) [78], for which overlaps with complex trial wavefunctions can be efficiently calculated using a quantum computer.[79] The standard implementation of the UCC Ansatz in the quantum circuit representation is to map Fermionic creation and annihilation operators onto strings of Pauli X, Y and Z matrices (or gates).There are multiple standard mapping schemes, [80][81][82][83] of which we employ here the Jordan-Wigner approach, [80] where the product of Z matrices encodes the required parity to preserve fermionic anti-commutation relations and σ ± p = X p ± iY p .Using this mapping leads to the following forms for single and double excitation operators: The exponential of each of these strings of Pauli matrices can be computed using a set of gates known as a Pauli gadget, [84] a controlled version of which is shown in Fig. 2.
As there is no immediate expectation-value form similar to Eq. 12 for r u i and s i computing such quantities requires auxiliary qubits.Two alternative circuit constructions to obtain r u i and s i are given in Fig. 1.The first uses fewer ancilla qubits, but it also requires a controlled version of the unitary Û , which decomposes into a larger number of two-qubit gates than its uncontrolled couterpart, regardless of the nature of Û .As a significant fraction of NISQ device error is due to twoqubit gates, the second circuit may be preferable on hardware.However, it requires an ancilla register of the same size as the system register, which causes a 2 n qubits -fold increase of the size of the qubit Hilbert space, leading to worse performance on quantum simulators.Work presented heretherefore employs the circuit in Fig. 1 (a).
In the particular case of the Pauli gadget-based unitary coupled cluster Ansatz, controlled-Û only requires one additional controlled rotation gate per Pauli gate string, making it relatively easy to implement.Compared to a given UCC circuit employed in conventional VQE, this approach requires one additional qubit and n electrons + 2 additional gate layers.The method also requires a constant number of additional two-qubit gates for each parameter in the Ansatz.The asymptotic scaling of circuit depth and two-qubit gate count is therefore equivalent to a conventional VQE algorithm employing the same UCC Ansatz.
Labelling the overall state in the qubit register of the circuit in Fig. 1 (a) as |Φ⟩, the expectation value of Z on the ancilla is given by so ⟨Φ| Ẑanc |Φ⟩ = ℜ(s i ).The Y -expectation value similarly gives the imaginary part of the overlap.Additionally, applying the Hamiltonian to the system register and measuring the expectation value gives the corresponding parts of the residual, In order to do this, the second-quantised Hamiltonian in Eq. 31 must be mapped onto a corresponding qubit Hamiltonian, which can be done using the same techniques discussed for excitation operators.This leads to an operator of the form where Pk are strings of Pauli operators.The expectation value in Eq. 39 can then be rewritten as and each term measured individually.The cost of measuring each residual therefore scales with the number of terms in the Hamiltonian, like the total energy calculation.Residuals computed in this way can be employed in a projector method based on Eq.21, leading to an unlinked PQE algorithm.Given noiseless, exact values for r u i and setting the shift to track the instantaneous projected energy, such an algorithm would converge directly to the ground state, without need for further Monte Carlo estimation.However, expectation values obtained from a realistic quantum device are subject to at least finitesampling noise.Therefore, it is perhaps wiser to consider the measured values of r u i as stochastic estimates of the underlying quantities and therefore use them in a QMC-like algorithm.The resulting MC-PQE method is outlined below: 1. On a classical machine, represent the wavefunction Û (θ) |ϕ 0 ⟩ as a distribution of walkers with populations N 0 corresponding to |ϕ 0 ⟩ and N i = θ i N 0 corresponding to each parameter.
2. Encode the wavefunction Û (θ) |ϕ 0 ⟩ on a quantum device and measure the residuals r u i .This is done here using the circuit in Fig. 1 (a).Measuring each residual is equivalent to an energy measurement.As in VQE, this requires individual measurement of each (non-commuting) term in Eq. 40, scaling as n 4  qubit .The number of residuals is equal to the number of parameters in the employed Ansatz.

Update each
4. Store the projected energy E proj = r 0 + S * s 0 .

Once a threshold population
N i has been reached, allow the shift to vary to maintain this population constant.In the original FCIQMC algorithm this is done by [33] although more recent update procedures have been developed that lead to finer population control and more rapid convergence.[85] 6. Repeat steps 2-5 until sufficient samples of E 0 and S have been accumulated to obtain accurate Monte Carlo estimates.
On the one hand, as discussed in Sec.IV, this approach has a series of benefits relative to a "deterministic" PQE approach, as it is resilient to noise in the quantum measurement, allowing the use of low shot numbers to obtain accurate results.Additionally, the shift provides a second independent estimator for the energy which can be used to more clearly identify when convergence has been reached.On the other hand, it is more efficient than a fully classical Quantum Monte Carlo algorithm as it both removes the need for cluster sampling and merges the death and spawning steps into a single process.One of the main benefits of QMC methods over conventional algorithms is that they take advantage of sparsity in the Hamiltonian to store a compressed version of the wavefunction, thereby decreasing classical memory overheads.In the following section we explore these QMC-inspired ideas to further decrease the memory or quantum resource costs of the algorithm detailed above.

Sampling the Hamiltonian
When computing the residuals r u i one must in principle measure each of the terms of the qubit Hamiltonian separately, as they require different post-rotations to be applied to the circuit before measurement.However, they can be further grouped [39,86] as where [ Pki , Plj ] ∝ δ kl .Terms in the same group G k commute, so they may be measured simultaneously using the same post-rotations.Nevertheless, in the general case, this does not reduce the operator to a single group, so a significant number of terms still need to be measured independently.See the caption of table V for some examples.Additionally, optimal grouping of Pauli terms is in general a hard problem.[87] For the two-body quantum chemistry Hamiltonian there is a simple heuristic partitioning, based on the qubitwise commutativity of Pauli strings.This leads to one large group comprising the identity string and all strings containing only Z gates and many small groups of strings containing some X and Y gates.Significant work has gone into further optimising the number of measurements required to obtain Hamiltonian expectation values, including modifying the partitioning between groups [31,32,88] or the Pauli strings themselves [30] to lower variance, or using (randomised) measurement techniques based on the classical shadows approach.[29,[89][90][91] Regardless of the means by which groups are obtained, one can employ an approach similar to that used in classical QMC algorithms and only select a subset of the groups to measure at each time-step.This decreases the number of circuit measurements needed per time-step, but introduces further noise in the estimate of r u i .However, provided no systematic bias is caused, this noise will be averaged out over the course of the imaginarytime propagation.
There exist various possible Hamiltonian term selection schemes, the simplest of which would be to sample uniformly from the set of Pauli terms or groups.Many of these terms will have negligible weights however, so in order to better capture the action of the Hamiltonian in a small number of samples, some form of importance sampling is recommended.One straightforward approach is to select each Pauli group with probability proportional the group weight For most quantum chemistry Hamiltonians investigated in this paper, we find that, when using qubitwise commutativity to partition the terms, the weight of the group containing the empty Pauli string and all Z-only strings, which we will label G 0 , strongly dominates the resulting probability distribution.Therefore, under this selection scheme, it is likely that for small numbers of samples it will be over-rather than under sampled.We also note that this term is qualitatively different from the other groups, as it describes the diagonal action of the Hamiltonian, while the others all correspond to offdiagonal contributions.Therefore, to obtain a more balanced description, we propose a scheme in which the diagonal group G 0 is always evaluated, while the other groups are selected with probability

Sampling the wavefunction
UCC is the natural candidate Ansatz for this type of quantum-enhanced QMC algorithm.One of the main challenges with using this Ansatz on NISQ devices is that it leads to deep circuits, with execution times that easily exceed the coherence times for current machines.We have previously shown that by using insight from a fully classical UCCMC calculation, [35] one can significantly shorten circuits while preserving accuracy by removing low-weight excitors from the Ansatz.[92] This circuit simplification can be done on-the-fly during the propagation, by stochastically rounding the parameter values before encoding the wavefunction, where p is a uniform random sample from the interval [0, 1) and the parameters θ i are ordered by decreasing amplitude.Once again, this is expected to increase fluctuations in the estimators at each time-step.The issue of potential systematic biases is explored further in Sec.IV.

Spawning
In the methodology detailed above, all parameters are updated at every time-step.However, one can take the approach commonly employed in QMC algorithms and only update a stochastically selected subset of the populations.In particular, for spawning in conventional QMC algorithms one can use a variety of excitation generation algorithms.[93][94][95][96] In principle, one could spawn onto connected determinants uniformly.However the spawning probability is proportional to Eq. 22), so for appropriate importance sampling of this process it is desirable that p gen (i|j) ∝ H ij .Different algorithms have been designed to approximately achieve this, using heat-bath sampling [93,96] or mathematical bounds on the value of H ij .[94] With the residuals computed according to the quantum algorithm in Sec.III A, death and spawning are no longer separate processes, so the parameters to be updated should be generated with If we write out the fermionic Hamiltonian as Ĥ = k h F k Fk , where Fk is some fermionic excitation operator and (non-uniquely) label the excitation between determinants ϕ i and ϕ j as Fij , then Sampling the distribution p gen is non-trivial even with access to a quantum computer and a qubit representation of the wavefunction and Hamiltonian.In particular, the circuits described in Fig. 1, which can be used to estimate the overlap itself, cannot be used for this purpose.However, one can efficiently generate determinants with probability by preparing the state |Ψ⟩, applying one of the operators Pk corresponding to each Fk to it with probability and then measuring the qubit register.While this is not a perfect importance sampling of the Hamiltonian coupling, it will better match its minima and maxima and should therefore represent an improvement over uniform sampling.An example for the H 4 molecule is shown in Fig. 3.These circuits do not require ancilla or additional controlled operations, so they are less costly to implement than those needed to compute the residuals.The number of shots needed is equal to the number of determinants one wants to generate, so they would add little computational overhead.While stochastic parameter updates may be valuable for larger systems, for the problems considered in this and Eq.49 for the H4 molecule, using an approximate ground state wavefunction.pgen is a better approximant for pgen than the uniform distribution.As a metric, the Kullback-Leibler divergence [97] of pgen with respect to pgen is 0.118 , while that of the uniform distribution is 0.541.
paper it is easy to store and cycle through all the relevant parameters and therefore probabilistic spawning is not employed.

C. Folded spectrum considerations
All the methods described above can equally be applied to the folded spectrum Hamiltonian, by replacing the residual with that in Eq. 30.As the squared Hamiltonian operator contains many-body terms of higher order than the original Hamiltonian, this will lead to more terms in the Pauli string representation of the corresponding qubit operator as well.However, some of these will commute with the terms in the original operator, so the number of different groups will see a smaller increase.Nevertheless, for non-trivial systems, this leads to a potentially insurmountable increase in required computation, even with efficient Hamiltonian term grouping methods.The possibility of sampling the Hamiltonian is therefore particularly promising in this case as a means to mitigate the additional costs.
As molecules are pulled apart electronic states approach one another and often become degenerate in the fully dissociated limit.Therefore, if one is searching for two different states which are close in energy (which will remain so after the folding procedure), noise in the wavefunction representation becomes potentially problematic as it may be sufficient to induce a switch from one state to another.In the case of ground state calculations, it has been observed in conventional CCMC that, particularly in more strongly correlated regimes, large fluctuations will sometimes cause a calculation to collapse onto a different, often unphysical, state.[98,99] This is made possi- ble by the nonlinear structure of the coupled cluster equations, which allow for multiple solutions, not all of which have a configuration interaction correspondent.[100,101] Therefore, the range of stochastic techniques that can be used in this scenario may be more limited than in ground state calculations.

IV. RESULTS
All results in this paper are obtained from state-vector or shot-based quantum circuit simulation using the t|ket⟩ platform.[102] Necessary atomic orbital Hamiltonian integrals were obtained from PySCF [103] and FCI calculations were performed using HANDE-QMC.[104] All MC-PQE calculations used the UCCSD Ansatz.The main systems considered are the H 2 , H + 3 , H 3 and H 4 linear molecules in the STO-3G basis set, [105] which serve as a useful proof of concept for the methods described here.These molecules are small enough that the corresponding Fock space vectors can be easily manipulated on a classical computer, while covering a range of chemical behaviour.Shot-based simulations are obtained by taking N shots samples from the probability distribution corresponding to the wavefunction encoded in a given quantum circuit.In order to reduce computation time required for larger systems, shot-based simulation is at times replaced by state-vector simulation with added Gaussian noise of mean µ = 0 and standard deviation σ.The average behaviour of such simulations should be very similar to true shot-based results for large enough N shots , but it will fail to capture some of the catastrophic failures of the very low shot count regime.
Average values of the shift S and E proj are obtained by a reblocking analysis [106] of data obtained in the steadystate regime of calculations, using pyblock.[107] A. Ground State Calculations We begin our investigation into the effects of sampling noise on the MC-PQE algorithm by carrying out calculations using the circuit in Fig. 1 (a), sampled with 10 1 to 10 4 shots for each Pauli term in the Hamiltonian.
Representative examples of MC-PQE energies over the course of calculations with differing numbers of shots for H + 3 at a bond length of r = 2.0 Å are shown in Fig. 4. The instantaneous energy estimators display significant fluctuations, although, unsurprisingly, the noise decreases as the number of shots is increased.There is a qualitative change in the behaviour of the projected energy using N shots = 10, with some samples deviating significantly from the expected mean.This is effectively due to the underestimation of the denominator in the expression of E proj (see Eq. 25), which leads to divergences in this quantity.There is therefore a lower bound on the number of shots needed to obtain a stable calculation, which is dependent on the overlap ⟨ϕ 0 |Ψ(β)⟩.The final estimators of the energy, obtained by averaging over the instantaneous energies once the calculation has reached its steady-state regime are given in Table I and display the expected lower variances as the number of shots increases.
Average values for the estimators S and E proj along the binding curve of H + 3 are given in Fig. 5.At large bond-lengths, we observe that divergences in the instantaneous values of the estimators due to undersampling translate into divergences of the overall estimator, with the projected energy more sensitive to this than the shift.This behaviour is not unexpected.As the bond length increases, the system becomes increasingly strongly correlated and therefore the weight of the reference determinant ϕ 0 decreases, requiring more sampling to estimate appropriately.
It is therefore the case that more strongly correlated tion of imaginary time, obtained from a stochastically represented wavefunction for H + 3 (blue), H3 (orange) and H4 (green) at r = 1.5 Å.The noise in the shift is roughly systemindependent, whereas the variance of Eproj increases with its absolute value.
systems will require more sampling than weakly correlated ones.This follows well-known trends in the difficulty of solving quantum chemical problems in a classical setting and we can offer a series of ways to alleviate the issue.Firstly, in the absence of noise, the overlap ⟨ϕ 0 |Ψ(β)⟩ decreases monotonically with β.Therefore, it will be possible to use fewer shots at short imaginary times and increase the shot count as the calculation progresses, thereby accelerating convergence to the steady state.For most Monte Carlo calculations however, the majority of the time is spent sampling this state, which cannot benefit from the early reduction in shots.
Secondly, different numbers of shots can be allocated to different quantities of interest.In this case, one could run a calculation with a small number of shots for the residuals, but a large number of shots for the reference overlap estimation.As the system size increases, this will represent a smaller fraction of the computational effort needed, thereby allowing calculations that are effectively as shot-efficient as ones with the lowest number of shots.
Another standard computational chemistry approach would be to replace the Hartree-Fock reference wavefunction in the projected energy with a more complex, TABLE III: Comparison of VQE and MC-PQE results for H + 3 and H4.n2q gives the number of two-qubit gates in each state preparation circuit.Each method was run for niter optimisation cycles, using n shots measurements per Pauli string measured at each step.n total gives the total number of measurements required for the calculation.For VQE with SPSA optimisation, this is 2nitern hamil n shots , while for MC-PQE it is (nparam + 1)nitern hamil n shots .∆E is the average absolute error over the binding curve, given in milliHartree.For a fair comparison, a variational estimate of the energy is also computed for MC-PQE.

Method
On a classical machine, this would require a linear increase in computation, however in the current scheme, all terms in both the numerator and denominator are already computed as part of the propagation.We test this for H + 3 , which in a minimal basis has two electrons in three orbitals, using a trial wavefunction of the form which includes the dominant configurations at dissociation, where |ij⟩ is the Slater determinant with spinorbitals i and j occupied.The orbitals are labeled in order of increasing energy, with even numbers corresponding to m s = 1/2 and odd numbers to m s = −1/2 orbitals.This is insufficient to ensure convergence of the extremely undersampled 10-shot calculation towards dissociation, but it does reduce the standard error in 100-shot calculations by a factor of approximately 1.5 Finally we compare the performance of MC-PQE with conventional VQE, performed using the Simultaneous Perturbation Stochastic Approximation (SPSA) optimisation algorithm, [108] which has been show to perform well for noisy variational optimisation.[109] In Table III we compare the average error in the ground state energy across the symmetric stretch of the H + 3 and H 4 chains, obtained by VQE and MC-PQE.The performance of both methods could undoubtedly be improved by increasing the number of iterations and measurements, but as it stands for both of these systems MC-PQE can achieve near chemical accuracy with at least two orders of magnitude fewer overall shots than conventional VQE.
We move on to consider stochastic implementations of the projection algorithm itself.We begin by considering the effects of stochastic sampling of the wavefunction and Hamiltonian independently, while doing the projection itself deterministically.Additional example MC-PQE trajectories are given in the appendix.Fig. 6 shows trajectories computed using deterministic projection on a stochastically rounded wavefunction for H + 3 , H 3 and H 4 , with corresponding statistical properties given in Table II.H 2 is not considered here as it only has one parameter in the wavefunction, so there is no rounding to be done.While the noise introduced into the shift by this procedure is roughly constant and system-independent, in itself a desirable property, the noise in E proj scales linearly with the absolute magnitude of the correlation energy, as can be seen from the bottom panel of Fig. 6.This is unsurprising since if all amplitudes are rounded down, we expect the projected energy to be zero.Nevertheless, for all three systems we obtain stable estimators for the correlation energy.For the H 3 and H + 3 systems, for which the UCCSD Ansatz is exact, the results are within error bars of the FCI value.For H 4 , there is an error of 10 milliHartree in the correlation energy estimators.This can be significantly reduced by computing the variational rather than projected energy estimator.[35] We then consider propagating the full wavefunction with a stochastically sampled Hamiltonian in which only N hamil commuting Pauli groups are included at each time-step.Corresponding values of the estimators for such calculations are given in Table IV.Once again, as expected, variance decreases with the number of terms included and we note happily that propagation under the action of the stochastically sampled Hamiltonian does not introduce any additional bias to the estimators.Noise in the shift fully depends on the details of the stochastic trajectory, but in order to minimise the variance of E proj we use the full Hamiltonian at each step.As this is equivalent to a single residual computation, the relative cost of employing the full Hamiltonian at this stage decays with system size.However, the projected energy could also be estimated using the sampled Hamiltonian at each step, decreasing computational overhead at the cost of increased sampling noise.
Considering the data presented above, we are satisfied that shot-based simulation, stochastic wavefunction rounding and Hamiltonian sampling are all compatible with MC-PQE and do not introduce significant biases into the results of a simulation.They do however all increase the noise present, requiring, much like classical QMC methods, a relatively long period of steady-state propagation to converge the desired quantities to the intended accuracy.
There are further ways to reduce the noise in the simulation which do not require increased sampling.One such method is shift damping, [33] which entails decreasing the parameter ζ in Eq. 42.This makes the shift less responsive to changes in population, thereby directly decreasing its variance.As the value of the shift affects the continued propagation of the wavefunction, we expect that, once the steady-state has been reached, less variability of the shift will translate to less variability of the wavefunction and therefore one might expect less noise in the projected energy as well.
As an example, we compare the shift and projected energy in calculations with different numbers of shots and different degrees of shift damping for H + 3 .While a 10-fold decrease in ζ is very effective in reducing shift noise (σ damped /σ = 18% (1000 shots), 20% (100 shots)), we see no effect in the projected energy (σ damped /σ = 101% (1000 shots), 102% (100 shots)).See the appendix for example MC-PQE trajectories.
Finally, we combine shot-based simulation with stochastic wavefunction rounding and Hamiltonian sampling.H + 3 binding curves obtained with the fully stochastic algorithm are given in Fig. 7. Simulations remain well behaved even with multiple sources of noise.We also note that the systematic increase in error due to stochastic rounding of the wavefunction appears to be partially cancelled out by other noise.As noted in the case of pure finite sampling noise, more highly correlated regimes require better sampling to achieve the same level of accuracy.
We also apply this stochastic PQE algorithm to first row mono-and dihydrides over a range of geometries, with results summarised in Table V.The cases studied include the dissociation of LiH and HF, the symmetric stretch of H 2 O at its equilibrium bond-angle and the C 2v insertion of Be into the H 2 bond to form BeH 2 .[110] Geometries for HF and H 2 O match those from [111] and the STO-3G basis set was used for all molecules.When using the same calculation parameters for all systems, it is unsurprising that errors are larger for more complex molecules.However, we note that by stochastically rounding the wavefunction and only sampling 20 Hamiltonian groups for each system, similar errors to the pure finite sampling case are observed.In all cases, the results correspond to relatively short propagation times and errors could be reduced by running the calculations further.

B. Excited States
As described in Sec II D, the algorithms above can also be applied to excited states by employing the modified Hamiltonian in Eq. 28.The results of example foldedspectrum calculations performed using exact statevector simulations for H + 3 are given in Fig. 8.In order to easily converge to the desired state, the calculations are initialised at the first geometry (shortest r HH ) with different reference determinants and values of ω close to their energies.States are then tracked along the binding curve by using the parameters and energies of the previous point to initialise the wavefunction and ω value for the current point.This allows smooth tracking of the states over the whole range of geometries considered.
We investigate introducing noise into these simulations, first in the form of pure sampling noise due to shot-based simulation.Results obtained using Gaussian noise with σ = 0.01 for H + 3 are shown in the left panel of Fig. 9.We note that, even with relatively realistic finite sampling noise levels, convergence is still very good, with obtained states closely following their FCI counterparts.However, not all instances of noise are as benign in this scenario, as can be seen in the right panel of Fig. 9, which shows results for the totally symmetric states of H + 3 obtained with stochastically rounded wavefunctions.In this case, particularly for the highly excited states, the rounding of the wavefunction leads to unphysical behaviour.The third excited state is most strongly affected, at points tending towards converging onto the highest excited state instead.One must note that because previous values of the energies are used as ω values at subsequent points, once a sufficiently large deviation occurs it is likely to remain or be amplified.This could be avoided by running independent calculations at each geometry, but a new problem of estimating a priori a good initial wavefunction and ω is introduced.The main limitation to using the folded spectrum method is the increase in terms in the squared Hamil-tonian compared to the original operator.However, the Monte Carlo stochastic sampling of the Hamiltonian described in section III B 1 provides an effective means of sampling the folded spectrum operator.We find (see middle panel of Fig. 9) that the same number of sampled Hamiltonian terms can be used per step as in standard ground state calculations to generate good quality excited state energies.This significantly expands on the range of systems for which folded spectrum methods are tractable.

V. CONCLUSION
In this paper, we present a Quantum Monte Carloinformed approach to projective quantum eigensolver algorithms.We begin by noting that rather than attempting to obtain the true ground state of a system to within the desired accuracy directly, once propagation has converged, we can obtain highly accurate estimates by classically averaging over samples obtained at different times over the course of imaginary time propagation.This allows us to use low shot counts without compromising final accuracy, leading to a two orders of magnitude reduction in the number of samples relative to comparable conventional VQE calculations.Notably, this reduction is due to an effective increase in the target variance of the measured observable, and therefore orthogonal to potential improvements due to more effective utilisation of the measurement information itself.Therefore, this method should be usable in conjunction with efficient measurement techniques such as those based on classical shadows to great effect.
We go further to decrease the quantum computational overhead, through stochastic rounding of the wavefunction and probabilistic sampling of the Hamiltonian.These procedures lead to shallower circuits and fewer shots required per step, but also generate noisier projectors.However, we observe that the Monte Carlo energy estimators employed are generally resilient to this additional noise, generating comparable accuracy to the deterministic projector, provided the resulting truncation at each step is not too drastic.In practice, for second row hydrides we observe that as few as 10% of the Hamiltonian terms are required.
Finally, we expand this method to excited state calculations, by employing the folded-spectrum approach, in which the Hamiltonian is replaced by a variance operator.The squared operator has significantly more terms than the underlying Hamiltonian, making this method traditionally more expensive than conventional eigensolvers.However, by sampling the operator we find that we can reduce the cost to that of an equivalent ground state calculation.
The fundamental reason these stochastic approximations and sampling methods work is that, given a wavefunction that is a steady-state of some operator, they induce fluctuations about this wavefunction that may be effectively averaged over.Therefore, their applicability is not limited to the projective quantum eigensolver and we believe they may be helpful in any quantum algorithm which aims to converge to a some well-defined final wavefunction.istically, but the Hamiltonian is sampled, for H + 3 .We note that as the number of terms in the sampled Hamiltonian increases, the deviations of the instantaneous estimators from the true energy decay, as expected.This corresponds to decreasing variance of the overall energy estimators.
Fig. 11 shows the same data for a fully stochastic MC-PQE calculation, with 1000 shots per measurement.Sim-ilar trends are observed with increasing number of Hamiltonian groups.
Fig. 12 shows the effect of decreasing the damping factor ζ in Eq. 42 on the fluctuations of the shift and projected energy estimators.While a 10× decrease in ζ reduces the variance of the shift estimator by approximately a factor of 5, it has no effect on E proj , as discussed in more detail in the main text.

FIG. 4 :
FIG. 4: Shift (top) and projected energy (bottom) as a function of imaginary time, obtained from 10 1 − 10 4 samples of the residual circuits for each Pauli term in the Hamiltonian, for H + 3 at r = 2.0 Å. Noise decreases as the number of shots increases.

1 FIG. 5 :
FIG. 5: Shift and Eproj estimates of the correlation energyof linear H + 3 over the range rHH = 0.5 − 3.0a0.Deviation from the true correlation energy increases as the number of shots is decreased, with deviations more pronounced in the projected energy and at larger bond lengths.This correlates with an increase in strong correlation and lowering of the overlap of the ground state wavefunction and the Hartree-Fock reference, ⟨ϕ0|Ψ(β)⟩.

FIG. 7 :
FIG. 7: Shift (top) and projected energy (bottom) estimators for the H + 3 correlation energy, obtained using a sampled Hamiltonian with only 2-11 Pauli groups selected at each time-step, a stochastically rounded wavefunction and 1000 shots per residual.

FIG. 10 : 3 FIG. 11 :
FIG. 10: Shift (left) and projected energy (right) as a function of imaginary time obtained by imaginary-time propagation using a stochastically sampled Hamiltonian with only 2-6 Pauli groups selected at each time-step for H + 3 at r = 2.0 Å.The full H + 3 Hamiltonian has 62 Pauli terms, split here into 25 commuting groups.

FIG. 12 :
FIG. 12: Shift (4 left panels) and Eproj (4 right panels) in a simulation of H + 3 at r = 1.75, using 100 or 1000 shots per residual estimation and shift damping parameters ζ = 1 in the undamped case and ζ = 0.1 in the damped case.Shift damping clearly reduces shift fluctuations, while leaving those in the projected energy unchanged.

TABLE I :
Mean and standard deviation of the shift and projected energy as the number of shots is increased, for H + 3 at r = 2.0 Å.All energies are given in Hartree.
H4n qubits n2q n shots niter n total ∆E n qubits n2q n shots niter n total ∆E

TABLE IV :
Mean and standard deviation of the shift and projected energy obtained from linearised imaginary timepropagation of the H + 3 wavefunction at r = 2.0 Å, using stochastically sampled Hamiltonians with increasing numbers of terms selected per step.All energies are given in Hartree.
multi-determinental quantity |Ψ T ⟩ which is hoped to have larger overlap with the true ground state, giving a new projected energy of the form

TABLE V :
(9)(10)(11)(12)e error (MAE) and non-parallelity error (NPE) in milliHartree for Monte Carlo PQE in some frozen-core second row hydrides, compared to fully deterministic PQE.These systems have 125 (LiH and HF) and 313 (H2O and BeH2) Hamiltonian groups respectively.Results in all columns have Gaussian noise with σ = 0.001 added to simulate sampling noise and results are obtained from runs of 2000 imaginary time-steps δβ = 0.2.The first set of results (columns 1-4) have no additional stochasticity.The second(5)(6)(7)(8)and third(9)(10)(11)(12)sets use stochastic wavefunctions and 10 or 20 sampled Hamiltonian terms per step respectively.The error converges towards the pure sampling error with few sampled Hamiltonian groups.For all methods, errors could be further decreased by using longer propagation times.