A Non-Orthogonal Variational Quantum Eigensolver

Variational algorithms for strongly correlated chemical and materials systems are one of the most promising applications of near-term quantum computers. We present an extension to the variational quantum eigensolver that approximates the ground state of a system by solving a generalized eigenvalue problem in a subspace spanned by a collection of parametrized quantum states. This allows for the systematic improvement of a logical wavefunction ansatz without a significant increase in circuit complexity. To minimize the circuit complexity of this approach, we propose a strategy for efficiently measuring the Hamiltonian and overlap matrix elements between states parametrized by circuits that commute with the total particle number operator. We also propose a classical Monte Carlo scheme to estimate the uncertainty in the ground state energy caused by a finite number of measurements of the matrix elements. We explain how this Monte Carlo procedure can be extended to adaptively schedule the required measurements, reducing the number of circuit executions necessary for a given accuracy. We apply these ideas to two model strongly correlated systems, a square configuration of H$_4$ and the the $\pi$-system of Hexatriene (C$_6$H$_8$). We show that it is possible to accurately capture the ground state of both systems using a linear combination of simpler ansatz circuits for which it is impossible individually.


I. INTRODUCTION
Large, error-corrected quantum computers are expected to provide powerful new tools for understanding quantum many-body physics. For example, such devices will be able to efficiently simulate long-time dynamics [1], and through phase estimation, measure the energy of a trial wavefunction while projecting it into the eigenbasis of the Hamiltonian [2]. Prior to the availability of such devices, it is natural to ask how today's noisy, intermediate-scale quantum (NISQ) platforms may be used for similar ends. One appealing strategy, the variational quantum eigensolver (VQE) [3,4], uses a potentially noisy quantum computer as a black box to prepare parametrized wavefunctions and measure their energy. By optimizing over the wavefunction parameters in a classical outer loop, one obtains a variational upper bound on the true ground state energy.
While it is believed that even a noisy, modestly-sized quantum computer can prepare and measure states that are out of reach for a classical computer [5], it will still likely be difficult to take advantage of this fact to surpass the capabilities of classical variational methods [6][7][8][9]. One serious challenge is the decoherence caused by noise, which is particularly damaging for quantum chemical calculations that demand a high degree of precision [8][9][10]. Recent works have presented a variety of approaches to overcoming this difficulty, including combining error detection schemes with postselection [10][11][12], extrapolating to the zero-noise limit [13][14][15], and using additional measurements and post-processing to construct better energy estimators [12,[16][17][18]. A complementary body of research has focused on developing new variational ansätze that use fewer gates and thus offer less opportunity for errors to occur [19][20][21]. In this work, we shall present a new approach in this latter direction which allows for a systematic increase in wavefunction complexity without a growing circuit depth.
The standard VQE approach uses a quantum computer to measure the expectation value of the Hamiltonian for some parametrized wavefunction, |ψ(θ) , in conjunction with a classical coprocessor that interprets the measurement outcomes, suggesting new values for the parameters to minimize θ the energy [4]. In our approach, we define instead a logical ansatz where each φ i (θ (i) ) is a parametrized wavefunction with a compact quantum circuit description. Rather than preparing the state |ψ directly on our device and measuring its energy, we use our quantum computer to prepare simpler superpositions of the states {|φ i } and measure the matrix elements of the Hamiltonian and overlap matrices, This allows us to classically solve a generalized eigenvalue problem, thereby finding the optimal c parameters and minimizing the energy in the subspace spanned by the set of states |φ i . The θ (i) values that parametrize each basis function φ i (θ (i) ) can then be optimized by a classical outer loop to lower the energy further, solving a new generalized eigenvalue problem at each step. Our approach shares similarities with recent work in NISQ-era variational algorithms that involve solving generalized eigenvalue problems [16,22,23], as well as a technique from classical quantum chemistry known as non-orthogonal configuration interaction [24][25][26] (NOCI). However, our approach also differs from this work in some key respects. Most importantly, we make no assumptions about the form of the component wavefunctions |φ i , other than that they have efficient quantum circuit implementations. In the context of quantum algorithms, prior work has assumed that these wavefunctions are generated by excitations from a fixed reference state [16], by imaginary time evolution [22], or by the simultaneous rotation of a set of orthogonal reference wavefunctions [23].
The classical non-orthogonal configuration interaction (NOCI) method, which is the closest relative to our approach, demands each of the |φ i wavefunctions to be a Slater determinant (not necessarily in the same single particle basis) [24][25][26]. It is worthwhile to note the utility of NOCI in the context of classical simulations. Unlike orthogonal CI expansions, NOCI covers a larger Hilbert space for a fixed number of determinants. Furthermore, it avoids the notorious orbital optimization problem of multiconfigurational self-consistent field theory (MCSCF) while obtaining comparable energies. Instead of directly optimizing the orbitals of the CI expansion, it separately optimizes individual determinants by finding a collection of different solutions to the Hartree-Fock equations. Once these non-orthogonal determinants are obtained, a single diagonalization of the Hamiltonian in this basis defines the NOCI energy and wavefunction. Just as NOCI can mitigate difficult orbital optimization problems to some extent, we hope to use this approach in the context of VQE to help reduce the optimization difficulty in general.
By allowing each of the basis functions |φ i to be any wavefunction which can be represented by a parametrized quantum circuit we obtain an extremely flexible form for our logical ansatz, |ψ = c i |φ i . We shall show that, for a wide variety of ansatz circuits, the required matrix element measurements between any |φ i and |φ j pair can be implemented using a number of quantum gates that is equal to the sum of the gates required to prepare |φ i and |φ j , plus a small factor that scales linearly with the system size. Notably, the per-run gate complexity is independent of the number of wavefunctions in the logical ansatz, making it possible to systematically add flexibil-ity to |ψ without increasing the required gate fidelity or coherence times of the quantum hardware.
This flexibility, however, comes at the cost of demanding more matrix element measurements. To ameliorate this cost we propose using a Monte Carlo technique to estimate the uncertainty in the ground state energy and to adaptively allocate our measurements of the matrix elements. Essentially, this scheme involves sampling from the distributions representing the uncertainty in the estimates of the Hamiltonian and overlap matrices and solving a small generalized eigenvalue problem for each sampled matrix pair. We then characterize the resulting distribution of ground state energy values by a sample variance. We suggest a heuristic that repeatedly determines which measurement to perform by calculating the sensitivity of this sample variance to additional measurements of each of the matrix elements.
We apply these ideas to two model chemical systems, a square configuration of H 4 and the π-system of hexatriene (C 6 H 8 ), which exhibit mixed strong correlation and dynamical correlation effects. In terms of strong correlation, we shall focus on a pair of strongly entangled electrons. Specifically, there can be two exactly degenerate determinants for certain geometries of these systems while the rest of the electrons contribute to dynamical correlation. We present two types of numerical experiments. In the first, we explore how well the ground state of these systems can be represented by an NOVQE logical ansatz, varying both the complexity of the constituent basis functions and the size of the subspace. In the second, we take a fixed set of basis wavefunctions and compare our adaptive protocol for scheduling measurements with a simpler alternative.

A. Matrix Element Measurement
The off-diagonal matrix elements of the Hamiltonian, H ij = φ i |Ĥ|φ j , do not correspond to physical observables and therefore cannot be measured directly in the usual manner. Nevertheless, it is possible to construct circuits that allow us to estimate them, for example, by using the Hadamard test [27]. In this section we present a simple strategy for measuring these matrix elements. We combine ideas from recent proposals for measuring off-diagonal matrix elements that appear in other contexts [28,29] with a trick inspired by the literature on the impossibility of black box protocols for adding controls to arbitrary unitaries [30]. Our strategy offers several benefits over a naive application of the Hadamard test. Namely, it doesn't require implementing controlled versions of the ansatz preparation circuits, and it enables the simultaneous measurement of matrix elements of multiple commuting observables while also yielding information about the overlap matrix elements, S ij = φ i |φ j .
For simplicity, we will describe below the case where H is a sum of commuting operators which can easily be simultaneously measured. In the more general case, the usual Hamiltonian averaging approach of grouping the terms into multiple sets that are each simultaneously measurable and measuring the sets separately can be applied without modification [4,8,31]. We begin by preparing the state where the second register is an ancilla qubit. This task can be accomplished by using controlled versions of the unitariesÛ i andÛ j that prepare |φ i and |φ j from a fixed reference state. Given some quantum circuit that implements the unitariesÛ i andÛ j , it is possible to construct circuits that implement the controlled version ofÛ i and U j , by replacing each gate in the original circuits with its controlled form. Even setting aside the difficulty of compiling such a circuit on a physical device with limited connectivity, the cost of implementing such a circuit on a near-term device (quantified by counting the number of two-qubit gates) will be substantially increased. For example, it is known that the decomposition of the Toffoli gate (the controlled-controlled-NOT gate) into a collection of single qubit and CNOT gates requires the use of six CNOT gates [32]. Given the limited coherence times and two-qubit gate fidelities of near-term hardware, we must ask if there are alternatives for implementing controlled versions ofÛ i andÛ j . An ideal protocol might allow us to implement a controlled version of an arbitraryÛ using a single execution of the original, unmodified circuit that implementsÛ . Unfortunately, a single use of oracle (blackbox) access to a generalÛ is insufficient for implementing a controlled version ofÛ in the quantum circuit model [30]. However, ifÛ i andÛ j preserve fermionic (or bosonic) excitation number and act trivially on the vacuum state, then we can circumvent this no-go result. We now show how this can be accomplished in the construction of a controlled unitary operator, We begin with a generic input state |ψ 0 |0 + |ψ 1 |1 , subject to the restriction that |ψ 0 and |ψ 1 are both states that are orthogonal to the state with zero particles, |vac .
3. Next, we execute the unmodified circuit forÛ i on the first system register, while doing the same witĥ U j on the second system register, yieldinĝ 4. We follow this with a second controlled-SWAP operation to produce the state, 5. Finally, we discard the now unentangled second system register to show completion of the action of the controlled unitary gate and obtain the desired result, For our purposes, we can take |ψ 0 and |ψ 1 to be the same fixed reference state, usually a Hartree-Fock state |ψ HF . Then |φ i =Û i |ψ HF and |φ j =Û j |ψ HF and we see that with the last step we have successfully prepared the desired state, We then apply a Hadamard gate on the ancilla qubit and perform aẐ measurement. It is easy to see that the expectation value ofẐ for the ancilla qubit will be Ẑ anc = Re φ i |φ j . Furthermore, the post-measurement state of the system register is either if the ancilla qubit was found to be in the +1 eigenstate, or if the measurement outcome was −1. These outcomes occur with probabilities 1+Re φi|φj 2 and 1−Re φi|φj 2 respectively.
In both cases, we proceed to measure the Hamiltonian H on the system register. Depending on the result of the ancilla qubit measurement, the resulting expectation values will be either or Now we consider the expectation value of the operator HẐ anc . By multiplying each of the conditional expectation values ofĤ by the corresponding eigenvalue of Z anc and taking the appropriate weighted average, we find that Furthermore, ifĤ is a sum of Pauli operators, then the usual Hamiltonian averaging approach and upper bounds on the variance of a VQE observable apply to Eq. 10 [31]. Therefore, by repeated measurement we can estimate Re φ i |Ĥ|φ j to a fixed precision using approximately the same number of measurements that we would need to measure a diagonal matrix element to the same accuracy. A similar approach allows us to estimate Im φ i |φ j and Im φ i |Ĥ|φ j by starting with the state

B. Diagonalization With Uncertainty
Given a collection of states {|φ 1 , |φ 2 , . . . , |φ n }, we are interested in determining the minimum energy state in the subspace that they span. To do this, we use our protocol described above to measure the matrix elements of the Hamiltonian and overlap matrices (Eq. 2), and solve the generalized eigenvalue problem (Eq. 3). Each of the matrix elements is only determined by a finite number of measurements, however, leading to some level of statistical uncertainty. In this section, we shall propose a simple Monte Carlo strategy to estimate the resulting uncertainty in the minimum eigenvalue of Eq. 3.
We model the experimentally determined values of each matrix element using a normal distribution. In practice, the experimental measurements of the matrix elements are individually described by draws from Bernoulli random variables, but the average of many such experimental outcomes is well-captured by a normal distribution [4]. When implementing the procedure described in this section on a near-term quantum computer, one could approximately determine the variance of these distributions from the experimental measurement record of the Hamiltonian and overlap matrix elements.
For the purposes of this work, we determine the variance of the Hamiltonian matrix element measurements using the upper bounds described in Refs. 4 and 31. Similarly, we observe that our scheme for measuring the overlap matrix elements will have a variance that is at most 1 m , where m is the number of measurements performed, and we use this upper bound as an approximation to the true variance. We use these approximations both in our simulation of the experimental measurement record and in our subsequent protocol to determine the uncertainty in the ground state energy. Throughout this section, we use a notation which separates the intrinsic component of the variance, which we denote by σ 2 , from the scaling with the number of measurements, m.
Experimentally, we only have access to estimates of φ i |Ĥ|φ j and φ i |φ j from our measurement record, which we denote byh ij ands ij . Taken together with our estimates of the variances,σ 2 Hij andσ 2 Sij , we can define the random variables These distributions represent our uncertainty about the true value of the matrix elements given the limited information provided by our experimental data.
To quantify the corresponding uncertainty in the ground state energy in the NOVQE subspace, we use a Monte Carlo sampling procedure. We accomplish this by repeatedly drawing from the distributionsH ij andS ij , and solving the resulting generalized eigenvalue problems. It is possible that the noise in our matrix element measurements and subsequent sampling destroys the positive semi-definite character of the overlap matrix. To deal with this, we follow the canonical orthogonalization procedure described in Ref. 33, discarding the eigenvalues of the sampled overlap matrices that are less than some numerical cutoff (and their associated eigenvectors). Each sampled pair of matrices yields a sample from the unknown distribution over possible NOVQE ground state energies. We then quantify our uncertainty in our estimate of this lowest eigenvalue by calculating the sample variance, σ 2 MC .

Experiment Design Heuristic
In the previous section, we proposed a Monte Carlo scheme for estimating the uncertainty in the NOVQE ground state energy caused by a finite number of measurements of the individual matrix elements. Here we build on this proposal to determine the relative impact of performing additional measurements. Ultimately, our goal is to create a reasonable heuristic for adaptively scheduling measurements to most efficiently use a limited amount of device time. By repeatedly sampling fromH ij andS ij and solving the resulting generalized eigenvalue problems, we obtain a distribution over NOVQE ground state energies with some mean µ MC and standard deviation σ MC .
We determine the impact of additional measurements of the matrix elements, H ij and S ij , on the uncertainty in the ground state energy by calculating the derivatives of the sample standard deviation, σ MC , with respect to the number of measurements performed, m Hij and m Sij . The resultant quantities, dσMC dm H ij and dσMC dm S ij , estimate how much we expect the sample deviation to shrink if we perform additional measurements of H ij or S ij . Note that we take these derivatives only with respect to m Hij and m Sij in the Monte Carlo sampling procedure of Eqs. 11 and 12, not the original measurements on the device. Therefore, no additional quantum resources are required. We use the TensorFlow software package to perform the Monte Carlo sampling ofH ij andS ij , calculate of the ground state energies, and estimate σ MC [34]. This enables us to evaluate the analytical expressions for each of dσMC dm H ij and dσMC dm S ij (for a fixed set of samples drawn from H ij andS ij ) without explicitly deriving the equations.
To optimally allocate our experimental measurements, we begin by performing a small number of measurements of each matrix element. We then estimate the derivatives dσMC dm H ij and dσMC dm S ij . Using these estimates, we simply choose to perform additional measurements on the matrix element whose corresponding derivative is the most negative. In practice, we perform these measurements in small batches so that the time taken by the classical processing of the measurement results is small compared to the time performing the measurements. By repeating this process for many steps, until we either achieve the desired accuracy or exhaust a pre-defined measurement budget, we will approximately optimize allocation of measurements between the different terms.

C. Implementation
The tools presented above are applicable for use with a variety of different ansätze, and subject only to the constraint that the circuits act on a common reference state and conserve fermionic excitation number in order to benefit from the efficient implementation of the effective controlled unitaries. For our numerical experiments, we shall focus on a particular class of wavefunctions known as k -fold products of unitary paired coupled cluster with generalized single and double excitations [20] (k-UpCCGSD). These wavefunctions have the appealing properties that 1) the required circuit depth scales only linearly in the size of the system, and 2) they can be systematically improved by increasing the refinement parameter k. We briefly review this ansatz below and then describe in more detail the implementation details of our numerical experiments.

The k-UpCCGSD Ansatz
The essential idea behind the k-UpCCGSD ansatz is to act on a reference state, Hartree-Fock in the case of this paper, with a product of k elementary blocks. Each block is an independently parametrized approximation to a unitary coupled cluster circuit generated by a sparse cluster operator containing only single and paired double excitations [35]. To this end, the wavefunction (before the Trotter approximation involved in compiling the circuits) is defined as follows.
where each cluster operator possesses an independent collection of variational parameters. (We omit the (x) superscript for simplicity and use Latin and Greek letters for spatial and spin indices respectively.) In contrast with the standard unitary coupled cluster single and doubles (UCCSD), k-UpCCGSD only includes doubles excitations which collectively move a pair of electrons from one spatial orbital to another. The resulting loss of flexibility is ameliorated by the use of generalized excitations that do not distinguish between occupied and unoccupied orbitals [36,37], and the k-fold repetition of the elementary circuit block. As a result, the number of free parameters in the ansatz scales as O(kN 2 ). We make use of the generalized swap networks of Ref. 38 to implement a single Trotter step approximation to the k-UpCCGSD ansatz with the open source Cirq and OpenFermion-Cirq libraries [39]. The resulting circuits have a gate depth of O(kN ) and use O(kN 2 ) two qubit gates.

Computational Details
The quantum chemical calculations of FCI ground state and HF reference were performed using the open source packages Psi4 and OpenFermion [40,41]. We optimized the ground state energy in the NOVQE subspace by varying the parameters of the most recently added ansatz wavefunction and diagonalizing the Hamiltonian and overlap matrices as described above. The variational parameters were randomly initialized from a normal distribution with mean 0 and variance σ 2 i . After some brief experimentation, for H 4 we set σ 2 i = 10 −4 , while for Hexatriene we set σ 2 i = 9.0. We found the larger variation in initial parameters for Hexatriene to be mildly helpful in avoiding repeated convergence to states with a high overlap when a large number of NOVQE states were added, an issue that we shall discuss below. For the numerical optimization, we used the Scipy implementation of the quasi-Newton limited-memory BFGS (L-BFGS-B) method [42,43]. We calculated the gradient at each step using a finite difference method with a step size of δ = 10 −6 . For H 4 we used at most C = 5000 calls to our objective function for each state added, while for Hexatriene we allowed for C = 10000.

III. RESULTS
H 4 is often used as a small testbed for single-reference coupled-cluster methods [44][45][46][47][48]. We shall focus on the square (D 4h ) geometry here. The system exhibits two exactly degenerate determinants at the D 4h geometry. Therefore, it will exhibit effects of both strong and weak correlation.
Another important class of chemical systems to investigate is hydrocarbons. In this work, we shall study a simple hydrocarbon, hexatriene (C 6 H 8 ). The interesting aspect of this molecule is that the torsional PES of a double bond leads to a strong correlation problem. At θ = 90 • , it exhibits two exactly degenerate determinants and therefore it is strongly correlated. To form the active space, we include the entire set of π electrons in the system along with both Π and Π * orbitals. The resulting active space is then (6e, 6o), and this also possesses a good mixture of strong and weak correlation.
In the following section, we present the results of two types of experiments related to our proposed NOVQE approach on these chemical systems and discuss the potential utility of our approach for more general chemical problems. With the first class of experiment, we focus on understanding how effectively the ground state can be represented by a linear combination of parametrized wavefunctions, optimized using the gradient-based approach we described above. We vary both the complexity of the individual ansatz wavefunctions by adjusting the number of circuit blocks (k) in the k-UpCCGSD ansatz and the number of states (M ) in the NOVQE subspace. For these calculations, we neglect the challenges posed by a finite number of measurements and the impact of circuit noise. In our second set of numerical experiments, we explore the extent to which our proposal for an adaptive measurement scheme is successful in reducing the number of circuit repetitions required to resolve the NOVQE ground state energy to a fixed precision. Figure 1 presents data on the application of NOVQE to the square geometry of H 4 with fixed bond distance R H-H = 1.23Å in a minimal STO-3G basis set, an N = 8 qubit problem. We consider the performance of the k-UpCCGSD ansatz for k = 1 to k = 5 with M = 1 up to M = 12 states in the NOVQE subspace, noting that M = 1 is equivalent to the regular VQE procedure. In all cases we consider the error in the median ground state energy found by the optimization procedure as a proxy for ansatz's ability to describe the ground state.

Hydrogen
Focusing first on understanding the behavior of the wavefunctions in the context of the standard VQE approach (M = 1), we note that while the ground state energy estimate dramatically improves with increasing k between k = 1 and k = 3, this trend fails and then reverses with higher values of k. Given that any state reachable by k = 3 is also trivially reachable by an ansatz with a larger value of k, we believe the difficulty in properly optimizing the richer ansatz is responsible for this behavior, particularly the challenges caused by the appearance of local minima or nearly vanishing gradients. Some preliminary experiments that used parameter values optimized with k = 1 to initialize calculations performed using richer, k > 1, circuits were not immediately successful. Exploration of more sophisticated initialization and optimization strategies constitutes a useful task for future work.
For k = 1 and k = 2, we observe that we can systematically improve the accuracy of the estimated ground Each independent calculation is plotted as a separate point, median values are connected by lines, and the 25th to 75th percentiles are shaded. The dotted horizontal line indicates 1 kcal/mol ≈ 1.59 millihartree, a commonly accepted value for "chemical accuracy". By increasing the number of states in the NOVQE subspace, we are able to drive the error below the threshold for chemical accuracy even though the system is challenging to treat using any of the ansätze we consider here in the regular VQE framework (M = 1 in the plot).
state energy by increasing the number of states included in the NOVQE subspace (M ). Given a sufficient value of M , all of the wavefunction ansätze we consider are able to achieve absolute errors well below chemical accuracy. This supports our thesis that a collection of ansatz which are individually insufficient to capture the desired state may be fruitfully combined to yield a logical ansatz of sufficient flexibility and optimizability. Interestingly, for this particular system, it appears to be more effective to add parameters to the individual ansatz by increasing k, than to add parameters to the logical ansatz by increasing M . This suggests that the most effective way to apply the NOVQE may be to work with the richest individual ansatz which can be afforded given the coherent quantum resources available, and to include additional states only when this limit is reached. The unsophisticated initialization and optimization strategy for subsequent states may also play a factor in the slow convergence with increasing M .

Hexatriene
Here we present our results for the ground state energy of two molecular configurations of Hexatriene (C 6 H 8 ) in an STO-3G basis with an active space of 6 electrons in 6 π orbitals (N = 12 qubits). The geometries are given in A. As with our calculations on H 4 , we run the NOVQE experiments over circuit depths k = 1 to 5, but here we consider subspace sizes as large as M = 40, due to the system's increased complexity. In Figure 2 we show the calculations for the equilibrium geometry (obtained by performing geometry optimization using density functional theory) and a configuration with a 90 • twist on the central Carbon-Carbon double bond respectively. We provide the geometries for these two configurations in the appendix in Table I and Table II.
Once again we notice that increasing the circuit complexity by taking larger values of k initially provides a substantial benefit but quickly saturates, with no benefit to taking k higher than 2 in this case. By contrast, adding additional states to the NOVQE subspace drives the error down below the threshold for chemical accuracy even with the most limited ansatz. We note that the marginal benefit of each additional state drops sharply with increasing values of M . This is reminiscent of the observation in conventional electronic structure calculations that a small number of determinants are sufficient to capture most of the wavefunction, but a long tail of dynamic correlation can result in a slow convergence to the true ground state as determinants are added to the variational space [49]. The classical intractability of calculating matrix elements between different coupled cluster wavefunctions means that relatively little work has been done on the representational power of wavefunctions like those used in NOVQE. In the future, it would be interesting to determine which of the general trends we have observed are specific to our choice of ansatz and optimization strategies and which would generically apply to any realization of NOVQE. Figure 3 shows the relative energies obtained by pairing specific calculations from each configuration together and taking the difference. In the low M regime we notice that the absolute errors are somewhat lower for the equilibrium configuration than for the twisted configuration, which is expected when using a linear combination of single-reference wavefunctions for a strongly correlated system. As we increase M , this difference largely disappears, a phenomenon that leads to some favorable cancellation of errors when calculating the relative energies between the two configurations.

H 4
In the previous subsection we presented data on the performance of NOVQE in the absence of noise during the circuit execution and measurement process. Now we consider the effects of statistical noise during measurement. Specifically, we determine how many circuit repetitions are necessary to evaluate the ground state energy within a target precision for a subspace defined by a fixed set of NOVQE states. For simplicity, we do not combine this analysis with an investigation of the optimization procedure. Instead, we take the optimized circuit parameters for a collection of M NOVQE states and compare the effectiveness of the adaptive protocol we described in Section II B 1 to a simpler alternative for determining the ground state energy in the subspace spanned by the optimized states, which we shall explain below.
The simpler protocol, which we shall refer to as nonadaptive, consists of measuring each matrix element of the Hamiltonian and overlap matrices the same number of times. For the adaptive protocol, we repeatedly use the procedure described in Section II B 1 to select a particular matrix element and perform measurements in batches of 10 5 . For the purpose of this comparison, we treat a 'measurement' of a particular Hamiltonian or overlap matrix element as a draw from a Gaussian random variable whose mean is the true value of the matrix element and whose variance is set by the upper bound described in Ref. 4, scaled by the number of measurements performed. Note that in a real experiment, or a finer-grained simulation, the Hamiltonian has to be decomposed into groups of terms that can be simultaneously measured, and one could apply an adaptive scheme like the one we propose to schedule measurements between these groups as well. For both kinds of numerical experiments we calculate a 2σ error bar using a bootstrapping sample size of 200 using the techniques of Section II B.
In Figure 4 we plot the actual trajectories of the estimates for the ground state energies, together with their error bars for both the adaptive and non-adaptive approaches to measurement, for two realizations of this numerical experiment. In both cases, we see that the adaptive protocol converges more quickly towards the NOVQE ground state energy than the non-adaptive one. Panel A corresponds to a typical realization of this exper-iment, one where the optimization procedure that generated the states was successful and the resulting generalized eigenvalue problem is well-conditioned. While the majority of the numerical experiments with H 4 were qualitatively similar to the results presented in panel A, we occasionally saw instances like those in panel B where the NOVQE procedure failed or the error bars were dramatically larger.
For the particular experiment shown in panel B, the large error bars and the 147.6 millihartree gap between the ground state energy in the NOVQE subspace and the FCI ground state derive from the same origin. In this instance, the optimization procedure which generated the fixed NOVQE states that were used to compare the measurement strategies became stuck and added several states that were nearly identical to each other. This resulted in an almost singular overlap matrix whose entries, therefore, had to be resolved to an extreme degree of precision before a good solution to the generalized eigenvalue problem could be found. This suggests that a practical implementation of NOVQE would benefit from including some regularization during the optimization that penalizes states whose overlap becomes too high.
Examining panel A, we find that the data qualitatively supports the assumption that the variance in the ground state energy estimate settles into an asymptotic regime where its behavior is well described by the relationship where N indicates the number of measurements performed and κ is some constant. For this particular realization, we find κ to be approximately 600 E 2 h for the non-adaptive scheme, and approximately 40 E 2 h for the adaptive one. Using the same upper bounds to calculate the variance for a regular VQE calculation performed on the same system would yield κ ≈ 28 E 2 h . Therefore, for these applications to H 4 , resolving the ground state energy to some target accuracy using our NOVQE approach requires only a modest increase in the number of measurements compared to the standard VQE.

Hexatriene
As in our analysis of H 4 , we compare the proposed adaptive approach to distributing measurements between the elements of the Hamiltonian and overlap matrices with a non-adaptive one, by choosing a collection of optimized NOVQE states and applying both methods to determine the ground state energy in the resulting subspace. In this case we choose to use M = 16 states, each of which is generated by a k = 2 k-UpCCGSD circuit, and focus on the equilibrium configuration of trans-Hexatriene. Examining Figure 5, we see immediately that the increased difficulty of this problem compared to H 4 is reflected in a much larger gap between the FCI Figure 3. Error in the relative energies between the ground states of the two Hexatriene configurations calculated using NOVQE for a variety of ansätze and sizes of the NOVQE subspace. Each independent calculation of the relative energy is plotted as a separate point, median values are connected by lines, and the 25th to 75th percentiles are shaded. The dotted horizontal line indicates 1 kcal/mol ≈ 1.59 millihartree, a commonly accepted value for "chemical accuracy". We note that the errors here are lower than those for the absolute energies presented in Figure 2, indicating that we benefit from a significant cancellation of errors when calculating the relative energies. Figure 4. We compare the ability of the adaptive and non-adaptive schemes for scheduling measurements to resolve the ground state energy of H4 in two different NOVQE subspaces of M = 4 optimized k = 2 k-UpCCGSD states. The trajectories of the estimated ground state energy are plotted in solid lines together with 2σ error bars indicated by the shaded regions. The actual energies of the ground states in the NOVQE subspaces are indicated with dashed green lines. In panel A we feature a typical NOVQE subspace with a well-conditioned overlap matrix. In panel B we present an NOVQE subspace where the optimization procedure failed to find the ground state and included multiple nearly identical states, resulting in a poorly conditioned overlap matrix. In both cases, the adaptive protocol converges significantly more quickly than the non-adaptive one.
ground state and the ground state in the NOVQE subspace, as well in a larger number of measurements required for convergence. Figure 5 shows the same substantial difference between the performances of the adaptive and non-adaptive ap-proaches that was observed for H 4 . Here we see that the true ground state of the subspace lies outside of the error bars for the non-adaptive scheme during a portion of the measurement procedure. Additional bootstrap samples might address this behavior, but we note that the adaptive scheme moves quickly to a regime where the uncertainty estimates are reliable even with a small number of samples. We once again observe that the variance qualitatively converges with the expected long-time 1 N behavior of Eq. 15 for most of the numerical experiment. Therefore, we can determine the 'intrinsic variance', κ, of each method and compare their statistical efficiencies.
For the non-adaptive scheme we observe κ ≈ 4 × 10 6 , while for the adaptive scheme we see κ ≈ 1.3 × 10 5 , with κ ≈ 1.6 × 10 2 being the reference value for a regular VQE calculation, determined using the same bounds assumed throughout this comparison. Comparing with the simpler H 4 example, we see that the adaptive scheme for measuring the NOVQE ground state energy of Hexatriene results in an even greater gain when compared to the non-adaptive scheme, but now falls short of the goal of reducing the cost to be comparable to that of a regular VQE approach. If the NOVQE approach is to be useful for such systems in the future, it will likely be necessary to develop further strategies to reduce the measurement overhead. One promising avenue is the adaptation of recently proposed strategies for measurement in the context of regular VQE to NOVQE [10]. These strategies have been shown to reduce the number of circuit executions by orders of magnitude when compared with the bounds used to derive the number of measurements in this work.

IV. DISCUSSION AND OUTLOOK
We have introduced an extension to the variational quantum eigensolver that calls for the ground state energy to be approximated by solving a generalized eigenvalue problem in the subspace spanned by a linear combination of M parametrized quantum wavefunctions. The resulting logical wavefunction ansatz is a linear combination of all M states in the subspace but can be determined by pairwise measurements of the Hamiltonian and overlap matrices. Therefore, it is possible to increase the flexibility of the ansatz without requiring additional coherent quantum resources. By analogy with the non-orthogonal configuration interaction method of classical quantum chemistry, we call our approach the nonorthogonal variational quantum eigensolver, NOVQE.
Our proposal necessitates off-diagonal measurements of the Hamiltonian and overlap matrices. We perform these using a modified Hadamard test. Naively, this would require us to implement controlled versions of the quantum circuits for state preparation. To avoid this cost, we demanded that the state preparation circuits all act on a common reference state and preserve fermionic excitation number. This allowed us to avoid the need to add controls to the ansatz circuits by instead performing controlled swap operations between two copies of the system register, a cost that scales linearly and modestly with the system size.
To determine the ground state energy in the subspace, our approach requires that we measure all M 2 elements of the Hamiltonian and overlap matrices in the NOVQE subspace. We presented a statistical strategy for estimating the uncertainty in the resultant ground state energy estimate for a given uncertainty in the matrix elements. We also pointed out how the machinery that generates these estimates can be leveraged in a Monte Carlo sampling process to determine which matrix element should be chosen for additional measurements to optimally reduce the uncertainty. We proposed an iterative approach, in which small batches of measurements are repeatedly performed according to this Monte Carlo prescription, to minimize the overall number of circuit repetitions required by our NOVQE method.
We demonstrated an implementation of our approach using a collection of k-UpCCGSD wavefunctions to approximate the ground state of two model stronglycorrelated systems, a square geometry of H 4 and the πspace of Hexatriene in two configurations. Growing the NOVQE subspace by adding and optimizing one state at a time, we showed how a collection of ansätze that are individually incapable of well approximating the ground state can be fruitfully combined to form a logical ansatz that does provide a good approximation. Interestingly, we observed that making the individual ansatz wavefunctions more flexible by increasing k from 1 to 2 yielded a large benefit, but that adding more parameters did not. In contrast, adding states to the NOVQE subspace yielded a consistent but slow improvement of the ground state estimate. While these observations are highly coupled to our particular implementation choices, they suggest that increasing the number of NOVQE states may be most useful when optimization challenges or coherence time limits the ability to improve the individual wavefunctions.
To characterize our proposal for adaptively scheduling measurements to minimize the number of circuit repetitions required by our approach, we focused on quantifying the number of measurements required to approximate the ground state energy in a fixed NOVQE subspace. For the purposes of this investigation we approximated the variance of the individual matrix element measurements using the bounds described in Refs. 4 and 31. For both our square H 4 and our equilibrium configuration of trans-Hexatriene, we optimized a set of NOVQE states and froze their parameters. We then applied our adaptive approach for scheduling measurements and compared it to a simpler non-adaptive scheme, in which each matrix element was measured the same number of times. We found that our adaptive approach used approximately 12 − 15 times fewer measurements for H 4 and approximately 30 times fewer measurements for Hexatriene than the nonadaptive alternative did. As a third point of comparison, we also considered the number of measurements required for a standard VQE approach and saw that our proposed method was within a factor of two of this baseline for most of our experiments with H 4 . In contrast with this optimistic result, our experiments with Hexatriene indi- Figure 5. We compare the ability of the adaptive and non-adaptive schemes for scheduling measurements to resolve the ground state energy of trans-Hexatriene in a particular NOVQE subspace of M = 16 optimized k = 2 k-UpCCGSD states. The trajectories of the estimated ground state energy are plotted in solid lines together with 2σ error bars indicated by the shaded regions. The actual energy of the ground state in the NOVQE subspace is indicated with a dashed green line. The adaptive protocol converges much more quickly towards the correct value but still requires a large number of circuit repetitions to do so. cated the need for almost three orders of magnitude more circuit repetitions than a normal VQE calculation, even with the benefit of our adaptive scheme for scheduling measurements.
We can imagine several routes towards ameliorating this challenge and developing NOVQE further. First of all, having states that are nearly linearly dependent in the NOVQE subspace can dramatically increase the cost of measurement. Developing an optimization strategy that regularizes this behavior away would be useful. Related to this is the possibility of extending the tools for measuring analytical gradients of parametrized quantum circuits to work with the NOVQE formalism. Another avenue for future work would be the development of good initialization strategies for NOVQE, potentially using reference states derived from a classical NOCI calculation. Finally, recent work has shown that a measurement strategy based on factorizations of the two-electron integral tensor can dramatically reduce the cost of the standard VQE approach [10], especially when compared to the type of bounds used throughout this paper [4,31]. Adapting this approach for use with NOVQE is likely to offer a significant improvement.
Beyond these modifications to the NOVQE approach outlined in this paper, it is also conceivable that the tools we have presented might be usefully employed in other ways. For example, we have focused on the optimization of a logical ansatz that is a superposition of individual parametrized wavefunctions. An alternative is to take inspiration from Ref. 22, and from the classical NOCI method [24][25][26]. Following these approaches, one could optimize the individual wavefunctions separately, and solve the generalized eigenvalue problem only once with the final collection of states. Another possible di-rection to pursue is the inclusion of one or more states in the NOVQE subspace that can be classically optimized. Along these lines, one could envisage using tensor network techniques to efficiently optimize an approximation to the ground state with a classical computer and only turn to the use of parametrized quantum circuits to prepare small corrections to the state, thereby reducing the number of measurements required.
In summary, this work has presented a promising new extension to the VQE formalism and highlighted both its advantages and its drawbacks. We have also presented a strategy for compiling off-diagonal matrix element measurements and promoted a general approach to Monte Carlo estimation of uncertainty, which may be of independent interest. Moreover, our circuit simulations of the k-UpCCGSD ansatz deepen the analysis of Ref. 20. We believe that the ability of our NOVQE to trade off coherent quantum resources for additional measurements may prove to be a useful tool in making use of NISQera quantum hardware for studying challenging strongly correlated systems.
In the final stages of preparing this manuscript a related work was posted which independently developed a similar approach in the context of variational quantum algorithms for solving linear systems of equations [50].