Benchmarking of quantum fidelity kernels for Gaussian process regression

Quantum computing algorithms have been shown to produce performant quantum kernels for machine-learning classification problems. Here, we examine the performance of quantum kernels for regression problems of practical interest. For an unbiased benchmarking of quantum kernels, it is necessary to construct the most optimal functional form of the classical kernels and the most optimal quantum kernels for each given data set. We develop an algorithm that uses an analog of the Bayesian information criterion to optimize the sequence of quantum gates used to estimate quantum kernels for Gaussian process models. The algorithm increases the complexity of the quantum circuits incrementally, while improving the performance of the resulting kernels, and is shown to yield much higher model accuracy with fewer quantum gates than a fixed quantum circuit ansatz. We demonstrate that quantum kernels thus obtained can be used to build accurate models of global potential energy surfaces (PES) for polyatomic molecules. The average interpolation error of the six-dimensional PES obtained with a random distribution of 2000 energy points is 16 cm$^{-1}$ for H$_3$O$^+$, 15 cm$^{-1}$ for H$_2$CO and 88 cm$^{-1}$ for HNO$_2$. We show that a compositional optimization of classical kernels for Gaussian process regression converges to the same errors. This indicates that quantum kernels can achieve the same, though not better, expressivity as classical kernels for regression problems.


I. INTRODUCTION
Quantum machine learning (QML) -loosely defined as machine learning that involves a quantum computer for any part of data modeling -is currently being researched as one of the promising applications of quantum computing.A popular implementation of QML encodes data into states of a quantum computer which are, after some unitary evolution, projected onto specific qubit states to estimate a kernel function for a given data set  . Kernl functions are used in kernel machine learning methods, including support vector machines (SVM), kernel ridge regression, or Gaussian process (GP) regression.Using a gate-based quantum computer, quantum kernels can be produced by encoding data into parameters of the quantum gates, operating with the resulting gates on qubits, and measuring the square of the amplitude of a particular state of qubits 15 .It has been shown that this can produce quantum kernels that, when used for SVM, offer a theoretical quantum advantage for classification problems 17,18 .Many articles have explored algorithms for constructing performant quantum kernels for classification problems [28][29][30][31][32][33][34][35][36][37] .However, whether quantum kernels can be used for accurate regression models of practical interest remains largely unexplored.
In order to determine whether quantum kernels can outperform classical kernels for prac-tical applications, it is necessary to compare the performance of the most optimal quantum kernels with the most optimal classical kernels for a range of benchmark regression problems.
To achieve this, it is necessary to align the functional form of the classical kernels and the gate sequence in the quantum circuits producing quantum kernels with the target function of each data set considered.As shown by Duvenaud and coworkers 76,77 , the performance of classical kernels can be systematically enhanced by increasing the complexity of the kernel function using an iterative approach combining simple kernel functions into products and linear combinations with the most optimal outcome of the previous iteration.Different kernel functions are discriminated by the value of the Bayesian information criterion (BIC) 78 , serving as an easy-to-compute surrogate of marginal likelihood.BIC is an asymptotically rigorous model selection metric: if a set of models contains the target function, the probability that BIC selects the true model approaches one as the number of data points increases 79 .
Our previous work has shown that BIC can be used as a model selection metric even in the limit of very restricted data 28,80,81 .
In the present work, we extend the algorithm of Duvenaud and coworkers 76 to improve the performance of quantum gate sequences used to build quantum kernels for GP regression.
We show that the accuracy of GP regression models with quantum kernels can be systematically enhanced by increasing the number of quantum gates in the underlying quantum circuits through an iterative algorithm guided by a metric, closely related to BIC.This provides an algorithm to build quantum kernels for accurate GP models with the least number of quantum gates.This consequently reduces the number of gate parameters to be optimized for training ML models and reduces the overall effect of gate errors.We demonstrate this by comparing quantum circuits obtained in this work with the generic ansatz used previously 1 .
The present algorithm aligns quantum kernels with a given data set using the same compositional optimization strategy as the algorithm of Duvenaud and coworkers 76,77 .This allows us to improve both the classical kernels towards optimal functional form and the quantum kernels towards optimal gate sequences for a comparison of the most accurate quantum models with the most accurate classical models, for each given data set.The present algorithm can thus be used to expose the limitations of quantum kernels for regression problems for practical applications and to benchmark quantum kernels by classical models.
Previous studies of quantum kernels for GP regression are scarce [1][2][3] .Gaussian processes model data with probabilistic distributions, which offer both the prediction of y and the Bayesian uncertainty of the prediction.As such, accurate GP models with quantum kernels can be used to bridge quantum computing with a variety of new applications, ranging from efficient interpolation of molecular properties in chemical compound spaces 82 , to building global PES for polyatomic molecules on a quantum computer 1 , to developing the quantum analogs of Bayesian optimization.In the present work, we use the optimized sequence of quantum gates to build quantum kernels for GP models of PES for H 3 O + , HNO 2 , and H 2 CO molecules, varying in the potential energy landscape complexity.The resulting quantum models of PES are then compared with the GP models based on classical kernels with the functional form optimized by the algorithm of Duvenaud and coworkers 76 .In addition, we compare the resulting models with neural network Gaussian process (NNGP) models, which provide an independent benchmark for both the quantum and classical model construction algorithms considered here.The results demonstrate that quantum kernels with optimized gate sequences yield GP regression models with accuracy comparable with, but not better than, the accuracy of the most accurate classical regression models.

QUANTUM KERNELS
Gaussian process regression (GPR) is a model of the dependence of a continuous output variable y on multi-dimensional input vectors x.In this work, y represents the potential energy of polyatomic molecules, and x encodes the molecular geometry.The dimensionality of x depends on the number of atoms in a given molecule and is equal to the number of vibrational normal modes.The GP model y(x) thus yields the global PES for the corresponding molecule.We consider the following molecules: H 3 O + , HNO 2 , and H 2 CO.
GPR is a supervised learning model, trained by a set of N input-output pairs D = {(x i , y i )} N i=1 .The predicted potential energy at an arbitrary point x * of input space is 83 : where k is the vector of size N with elements given by the kernel function values k(x i , x * ) and K is the N × N matrix with elements given by the kernel function values k(x i , x j ) with x i , x j ∈ D. The prediction is parametrized by σ 2 n representing the variance of data noise, which is set to zero in this work since potential energy calculations are assumed to be noise-free.The kernel function k(x, x ′ ) must be positive-semidefinite and symmetric to interchange of x and x ′ .In the present work, we compare models with the kernel function k(x, x ′ ) obtained using classical algorithms and from quantum states of a gatebased quantum computer.We refer to the latter as the quantum kernels.
Once a kernel function is chosen, the parameters of the kernel function for a GP model are estimated by maximizing the following function: which is known as the type-II maximum likelihood estimation 83 .In Eq. ( 2), θ represents collectively all kernel function parameters and y is a vector of length N collecting all outputs in the training set D. For classical kernels, θ includes all free parameters of the mathematical expression for a given kernel function.For quantum kernels, θ represents parameters of quantum gates, as described in detail below.In order to benchmark quantum kernels, it is necessary to consider classical kernels with varying function complexity in order to build the most optimal classical GP models for a given data set.Here, we employ two algorithms for aligning classical kernels of GP models with given data: the algorithm of Duvenaud and coworkers 76 , yielding composite kernels, and neural network Gaussian processes 84 .

A. Classical composite kernels
Most traditional applications of kernel methods use simple mathematical expressions for kernel functions, such as the radial basis function (RBF): where θ is the kernel function parameter that can be tuned to train the corresponding model.
The RBF kernel is known to be universal 85 and represents one of the most commonly used kernel functions in kernel ML.We use the RBF kernel as the benchmark example of simple classical kernels.
In addition, we build classical models with composite kernel functions optimized to maximize the Bayesian information criterion as proposed by Duvenaud and coworkers 76,77 .This approach can be used to increase the kernel function complexity in order to improve the resulting GP models.The algorithm for building composite kernels was described in Refs.
76, 77, 80, and 81 and is illustrated in Figure 1.For a given dataset, the kernel selection process begins with a set of simple (base) kernel functions, including Eq. ( 3) and the following functions: where d (x, x ′ ) is the Euclidean distance between x and x ′ , α is a positive scale mixture parameter, l is a positive length-scale parameter, p is a positive periodicity parameter, Γ is the gamma function, v is a non-negative half-integer, and K v is the modified Bessel function.
These kernel functions are represented in Figure 1 as k i with i ∈ [1, 5].The kernel functions are discriminated by the value of Bayesian information criterion (BIC) defined as 8: No kernel iteration 1 iteration 2 where log L( θ) is the maximum value of the function in Eq. ( 2) and M is the number of parameters in the kernel function.In the following iteration, the base kernel with the largest value of BIC (k opt ) is combined with each base kernel as linear combinations c i k opt + c j k j and products c i k opt × k j , resulting in 10 new kernel functions.These kernel functions are optimized by varying both θ and the coefficients c i to maximize Eq. ( 2).The kernel with the largest value of BIC is then selected as the optimal kernel and the process is iterated.This iterative process continues until the BIC value of the optimal kernel converges, as shown in the right panel of Figure 1, for several examples of GP models of PES for the molecules labeling the curves.

B. Neural network Gaussian processes
To provide an independent benchmark for quantum kernels, we also consider neural network Gaussian process (NNGP) models 84,[86][87][88][89][90][91] .NNGP models exploit the connection between GPs and neural networks (NN).The output of a single-layer, fully connected feed-forward neural network is given by with where x i is the ith component of the p-dimensional vector x, W ij and b 0 are the weight and bias parameters for the layer, ϕ(y) is a non-linear activation function, and N is the number of nodes.
If the weights and bias parameters are chosen to be random variables, the central limit theorem ensures that, in the limit of infinite width N → ∞ with priors N (µ w , σ w 2 ) and with mean µ and covariance k functions.Given the freedom of the priors, one can choose It was recently shown that this can be extended to NN with multiple hidden layers [86][87][88]90,91 .
The variables y j and N must now be labeled by the layer index l ∈ [1, L].Given that each of y l−1 j , used as inputs into node i of layer l, is an independent GP, as N l → ∞, y l i is also a Gaussian process GP(0, k l ) with the covariance function Any two y l i and y l j in the same layer l of this NN are independent.The recursive form of k l depends on the activation function ϕ and the parameters σ

C. Quantum kernels
Quantum kernels considered here are given by where U (x) is a unitary transformation given by a sequence of one-and two-qubit gates acting on a quantum state of m qubits, and |ψ 0 ⟩ = |0⟩ ⊗m .Note that U (x) is parametrized by x so the quantum sates U (x ′ ) |ψ 0 ⟩ and U(x)|ψ 0 ⟩ depend on the input vectors x ′ and x.We encode each dimension of the molecular geometry vector x into a separate qubit.
The number of qubits required to build quantum states U(x)|ψ 0 ⟩ is therefore equal to the dimensionality of the configuration space of the molecule.A sequence of gate operations obtained by the projection of the output state U † (x ′ ) U(x)|0⟩ ⊗m onto the original state The performance of the quantum kernels in ML models is determined by the specific sequence of gate operations in U. Previous work on GP regression with quantum kernels 1 used a fixed ansatz U given by where H is the single-qubit Hadamard operator, σ Z,i is the Pauli Z-matrix operating on qubit i, and ϕ i and ϕ ij are the quantum gate parameters.The quantum circuit for the quantum kernel based on this ansatz is shown in the upper panel of Figure 2 for the example of a six-dimensional problem (a molecule with four atoms).The quantum circuit in Figure 2 (top panel) is depicted as a sequence of Hadamard gates, R Z gates defined as and entangling R ZZ gates defined as where the parameters ϕ depend on the qubit index for R Z and two qubit indices for R ZZ .
The ansatz shown in Eq. 15 was borrowed from previous work on QML for classification 15 and is considered to be general, as it provides entanglement between all pairs of qubits.
Our previous work showed 1 that this ansatz yields quantum kernels for GP regression of a six-dimensional PES with accuracy comparable to that attainable by GP regression models with the RBF kernel.
The present work aims to develop an algorithm for optimizing the gate sequences in U to adapt the quantum kernels to a given data set.This can be viewed as a search problem in the space of all possible gate permutations.The complexity of the search increases exponentially with the number of gates in the quantum circuits.To accelerate the search, we use the strategy inspired by the algorithm of Duvenaud and coworkers 76,77,80,81 .We previously observed that a similar algorithm can be used to enhance the performance of quantum kernels for classification problems with SVM 28 .
The algorithm is based on the following three gates as the building blocks: the Hadamard gate H, the R ZZ gate, and the R Y gate defined as where σ Y is the Pauli Y -matrix and the parameter ϕ depends on the index of the qubit involved in the transformation.The unitary transformation U is structured, to begin with, a layer of Hadamard gates H ⊗m , followed by parameterized entanglement layers U e including a sequence of R ZZ gates and identity gates, and to conclude with a parameterized layer of R Y gates: The sequence of gates in U e is determined by the compositional search algorithm described below and illustrated in Fig. 3.
The input vectors x are encoded into the quantum gates as follows: where x i and x j are the ith and jth components of vector x, and θ i and θ ij are the trainable parameters of a quantum kernel.The parameters θ i for single qubit rotation gates are independent of each other, while all parameters θ ij of the two-qubit gates R ZZ gates are set As was previously observed 1 , the likelihood functions L can be very small for a large number of quantum circuit parameters θ.As a result, the function defined by Eq. ( 2) exhibits a large number of divergences, which complicate the numerical optimization.Therefore, as in Ref. 1, we optimize Step 1: Incremental circuit growth All one-layer circuits Circuits   we use Bayesian optimization to maximize the function in Eq. ( 22) to obtain optimized parameters for quantum kernels.For more details on training quantum kernels, see Ref. 1.
To evaluate the performance of Bayesian optimization, we repeated optimization of GP models with the quantum kernel based on the fixed ansatz in Eq. ( 15) with the Adam gradient descent algorithm 93 .The comparison of Bayesian optimization and gradient-based optimization for training QGP models is illustrated in Fig. 4. To compute the gradients of β in Eq. ( 23) with respect to the kernel parameters, we use back-propagation by automatic differentiation as implemented in the Pennylane package 94 .The results show that the gradient descent optimization rapidly converges to local exterma for H 3 O + (green curves) and H 2 CO (purple curves), while Bayesian optimization leads to larger values of β and higher interpolation accuracy of the QGP models.For a more complex PES of HNO 2 , the gradient descent optimization converges faster and to the same limit as Bayesian optimization.Both optimizers reach similar values of β and interpolation accuracy with 400 iterations.We note that the back-propagation by automatic differentiation is more costly than Bayesian optimization as it requires additional computation resources to store tensors and calculate the Jacobian matrix for each step in the calculation of β 94 .

III. COMPOSITIONAL OPTIMIZATION OF QUANTUM GATE SEQUENCES
To build an optimal unitary transformation U e in Eq. ( 20), we propose the strategy illustrated in Fig. 3.This algorithm increases iteratively the complexity of U e and hence improves the expressivity of the corresponding quantum kernel.As shown in Fig. 3, the algorithm is controlled by hyperparameter M , which controls the number of quantum circuits retained in each iteration.As M → ∞, the algorithm approaches full search in the space of gate permutations.
Duvenaud and coworkers 76,77,80,81 used the BIC defined in Eq. ( 8) as the model selection metric.As mentioned above, however, log L is unsuitable for optimizing quantum circuit parameters.For numerical stability, we use the following function as the quantum circuit selection metric: where Ô is the maximum value of the function given by Eq. ( 22).As log O and log L are maximized by the same set of parameters, β is analogous to BIC.
As illustrated in Figure 3, the algorithm is initialized by a pool of single-layer circuits, which contain all possible combinations of R ZZ and identity gates for a given number of qubits -a total of J circuits.To reduce the computational cost, we require that each qubit can only be operated by one gate in each layer.Each iteration of the algorithm includes three steps.First, the algorithm combines M circuits with the largest value of β in Eq. ( 23) with all possible single-layer circuits to generate M × J new circuits.As shown by Eq. ( 20), each new circuit is sandwiched between a layer of Hadamard gates and a layer of R Y gates.
Second, each wrapped circuit is parameterized by the optimized parameters from the M selected circuits.The quantum kernel selection metric β is computed for each circuit and M circuits with the largest β are selected for the next iteration.Third, using the previously determined parameters as initial guesses, the kernel parameters of each selected circuit are optimized by maximizing log Ô using Bayesian optimization.The iterative process continues until the largest value of β converges.After the convergence, all kernel parameters of the circuit with the largest β are re-optimized to obtain the optimal parameterized circuit.
A similar algorithm was previously used to improve the performance of quantum kernels for classification with SVM 28 .However, there are some important differences between clas-sification with SVM and GP regression, which leads to the following important differences in the compositional search of quantum circuits.For SVM models, the architecture of quantum circuits can be optimized separately from the parameters of the quantum circuits.This simplifies the compositional search of quantum circuits to a great extent.For regression problems, on the other hand, the parameters of the quantum kernels must be optimized at every step of the circuit selection algorithm (Step 2 in Figure 3).To reduce the complexity of the problem, we restrict the choice of gates for U e to R ZZ and identity operators and allow only one gate per qubit per layer.In addition, the outputs of SVM must be converted to probabilistic predictions to allow the computation of BIC on a validation set.In the context of GP regression, no validation set is necessary as BIC, or equivalently the value of β as defined by Eq. ( 23), is a byproduct of training the GP model by type-II maximum likelihood estimation.

IV. NUMERICAL RESULTS
To illustrate the algorithm for the compositional optimization of quantum kernels described in the previous section and benchmark the performance of the quantum kernels, we consider three six-dimensional regression problems yielding global PES for the following molecules: H 3 O + , HNO 2 , and H 2 CO.The molecular geometry is described by sixdimensional vectors x.The components of x are defined as x i = exp(−r i /a), where r i is one of six atom-atom distances within a molecule and the parameter a is fixed to 2.5 95 for H 3 O + and 1.0 for all other molecules.
The GP models are trained and tested by the potential energy of the molecules computed in Refs.95 and 96.The potential energy data include 31124, 77272, 77284 ab initio points spanning the energy range [0, 21000] cm −1 for H 3 O + , [0, 44000] cm −1 for H 2 CO, and [0, 36000] cm −1 for HNO 2 , respectively.GP models are trained with N ab initio points that are randomly sampled from a specific energy interval.
The compositional optimization of gate sequences for quantum kernels is performed with a small number of potential energy points (300, 400, or 500, as specified in the corresponding figure captions) randomly sampled from the configuration space of the corresponding molecule.The model accuracy is quantified by the root-mean-squared error, where ŷi denotes model predictions and y i are the corresponding potential energy points from Refs.95 and 96.The sum in Eq. ( 24) extends over all ab initio points that are not used for training the models.

B. Comparison of quantum kernels with classical kernels
The compositional search of gate sequences aims to produce quantum kernels aligned with the target functions of GP regression.In the limit of M → ∞ (c.f. Figure 3), the compositional search is designed to yield an optimal quantum circuit for a given data set.In this section, we benchmark the performance of the resulting quantum kernels (obtained with M = 75) by comparison of the resulting quantum GP models with GP models employing two kinds of optimized classical kernels: composite kernels with the functional form determined by the algorithm of Duvenaud and coworkers 76,77,80,81 and NNGP kernels.The results are presented in Figures ( 6) -( 7) for three different regression problems.The figures depict the results for the models of the global PES for H 3 O + (Fig. 6), H 2 CO (Fig. 7), and HNO 2 (Fig. 8).
The upper left panel of each of these figures shows the convergence of the model error with the complexity of the quantum kernels and the classical composite kernels.For quantum kernels, the number of iterations corresponds to the number of layers in the underlying quantum circuit.Each iteration adds one single-layer circuit in the most optimal position as determined by the algorithm.For classical composite kernels, the number of iterations corresponds to the depth of the search tree depicted in Figure 1.Each iteration adds one simple kernel function to the complex mathematical form as depicted by Figure 1.It can be seen that the errors of the classical and quantum models of PES converge to the same value as the number of iterations increases.This is illustrated with two training samples (300 and 500 energy data points) used both for the compositional search of the kernels and for training the models for each PES.This indicates that quantum kernels can achieve similar expressivity as classical kernels for regression problems.
The upper right panel of Figures ( 6) -( 8) compares the errors of the most accurate models of PES for H 3 O + (Fig. 6), H 2 CO (Fig. 7) and HNO 2 (Fig. 8) based on a single RBF kernel, classical composite kernels, quantum kernels with the fixed ansatzs from Refs.
1 and 15, quantum kernels with the adaptive, variable ansatz, and NNGP kernels.Where the structure of the kernels is adaptive, the kernels are built with 2000 energy points.The GP models are then built with these kernels and with 2000 energy points in the training set.The results show that RMSE of the quantum modes with the variable, adaptive ansatz yield similar accuracy as the most performant classical kernels for all three cases considered.
This observation is further illustrated in the lower panels of Figures ( 6) - (7) showing the dependence of RMSE for interpolation (lower left) and extrapolation (lower right) models of PES.The interpolation models are built with energy point samples from the entire energy range of PES.To ensure unbiased comparison, these samples are selected to be identical for each kernel, but random for each number of data points considered.The structure of the kernels (for classical and quantum variables) is determined with the number of energy points indicated on the x-axis randomly sampled for each kernel.The same energy points are used for determining the kernel hyperparameters and for GP model predictions.
The results shown in the lower right panel of Figures ( 6) -( 8) illustrate the error of GP models that extrapolate PES in the energy domain.The 1500 training data samples for these models are randomly chosen from a fixed energy range below the energy threshold specified on the x axis as a percentage of the energy range of the full PES.RMSE is then calculated using all available potential energy points from the energy range above the energy threshold, i.e. the energy range outside of the training data distribution.The results show that GP models with classical composite kernels and quantum kernels based on the variable, adaptive ansatze yield similar extrapolation accuracy.At the same time, the extrapolation performance of these models is much better than the performance of GP models with the RBF kernel or with quantum kernels using the fixed ansatz.

V. SUMMARY
The main goal of the present work is to present an unbiased comparison of the performance of classical kernels and quantum kernels for regression problems of practical interest.
To perform such a comparison, it is necessary to construct the most optimal classical kernels and the most optimal quantum kernels for a given data set.We seek to determine if quantum kernels can achieve the same expressivity as classical kernels for practical regression problems.This is an open question because the previous studies considered either model low-dimensional problems or comparisons between quantum kernels and specific, fixed classical kernels.
For classical kernels, we adopt two strategies.The first strategy increases the complexity of the kernel function incrementally by an iterative approach designed to maximize the Bayesian information criterion.This leads to composite kernel functions designed to maximize the generalization accuracy of a given regression model.The second, independent, strategy employs neural network Gaussian processes, which allows for enhancing the kernel expressivity by increasing the number of NN layers.We note that, while the two approaches yield models with similar accuracy, the composite kernel functions are consistently more accurate for the problems considered here.This indicates that the composite kernel functions determined by BIC maximization provide robust benchmarks for quantum kernels.
For quantum kernels, we develop an algorithm that uses an analog of BIC to optimize the sequence of quantum gates used to estimate quantum fidelities.We show that this algorithm yields quantum circuits producing much better accuracy with fewer quantum gates than a fixed ansatz proposed and used previously.The present algorithm increases the complexity of the quantum circuits incrementally while improving the performance of the resulting kernels.The algorithm is controlled by hyperparameters that balance the efficiency of the circuit architecture optimization and the size of the explored space of gate permutations.A similar strategy can be used for other applications that require optimization of gate sequences.For example, a closely related algorithm was used in our previous work for state preparation in order to identify efficient quantum circuit representations of ro-vibrational states of polyatomic molecules for computations with variational quantum eigensolvers 97 .
More generally, various algorithms based on incremental, iterative growth of quantum circuits have been employed in variational computations for state preparation Our results show that the errors of the regression models with classical composite kernels and quantum variable circuits converge to the same value as the complexities of the kernels increase.This indicates that quantum kernels can achieve similar, though not better, expressivity as classical kernels for regression problems.We have considered interpolation models yielding global six-dimensional PES for three polyatomic molecules with training data from the entire energy range and extrapolation models that use ab initio data from a low-energy part PES to predict PES at higher energies.For extrapolation models, classical composite kernels and quantum variable kernels yield the best models with similar accuracy, significantly exceeding the accuracy of models with RBF kernels or the accuracy of quantum models with a fixed ansatz.
Finally, our work demonstrates that quantum kernels can be used to build accurate models of global PES for polyatomic molecules.The interpolation RMSE of the 6D PES obtained with a random distribution of 2000 energy points is 16 cm −1 for H 3 O + , 15 cm −1 for H 2 CO and 88 cm −1 for HNO 2 .These errors can be further reduced by increasing the number of energy points used for model training.This indicates that fitting of PES for molecular systems can be performed on a quantum computer.When combined with previously proposed quantum algorithms for quantum chemistry [102][103][104][105][106][107][108][109][110][111][112][113] , and nuclear dynamics [114][115][116][117][118][119] , this implies that the full stack of ab initio calculations of molecular properties, including electronic structure, fitting of PES and molecular dynamics, can be implemented on a quantum computer.

FIG. 1 .
FIG. 1. Left: Schematic illustration of the iterative process of composite classical kernel construction.At each iteration, the kernel with the largest value of the Bayesian information criterion (BIC) is chosen as the optimal kernel.The functions k i with i ∈ [1, 5] correspond to the kernel functions given in Eqs.(3)-(7).Right: Dependence of the BIC value for GP models of six-dimensional PES for the molecules indicated in the curve labels on the kernel complexity level.Each model is trained with 500 values of the potential energy randomly sampled from the configuration space of the corresponding molecule.
b .Here, we use ϕ(y) = erf(y) = 2 √ π y 0 e −t 2 dt and treat σ (l) w and σ (l) b for each layer l as independent trainable parameters.The parameters σ (l) w and σ (l) b for each layer are optimized by type-II maximum likelihood estimation 83 using Bayesian optimization.The number of layers L of NNGP is increased until the magnitude of the function in Eq. (2) converges.

2 .
FIG. 2. Upper panel: Quantum circuit U for the fixed ansatz used for GP regression in Ref. 1 that yields orange triangles in the lower right panel.Lower left: Quantum circuit U for the optimal quantum kernel obtained by the compositional optimization algorithm described in the text.The ansatz is identified with 500 training points randomly sampled from the entire energy range and yields blue circles in the lower right panel.Lower right: RMSE of the GP model of PES for H 3 O + with the variable ansatz (blue circles) and fixed ansatz (orange triangles) for the quantum kernels as a function of the number of training points.H denotes the Hadamard gate, R Z -the R Z gate and R Y -the R Y gate.All entangling gates are R ZZ .

FIG. 3 . 1 )FIG. 4 .
FIG. 3. Schematic diagram of the compositional search of quantum circuits used in the present work.The algorithm increases the number of layers in quantum kernels iteratively and includes three steps: (1) M best circuits from the previous iteration are appended with the pool of one-layer circuits to generate a pool of circuits with a greater depth; (2) M quantum circuits with the largest values of β as defined by Eq. 23 are retained for the next iteration; (3) The trainable parameters of each circuit are optimized by Bayesian optimization to maximize Eq. (22) on a training set with randomly sampled ab inito points from the entire energy range of the target PES.

Figure 2
Figure 2 compares the quantum circuits of the fixed ansatz in Eq. (15) with the most optimal outcome of the compositional optimization for the model of PES for the molecule H 3 O + .The lower left panel of Figure 2 shows the architecture of quantum kernel U produced by the compositional optimization algorithm and the lower right panel compares the RMSEof the GP model predictions with the quantum kernels from two U circuits shown.Figure2demonstrates that the compositional optimization produces quantum kernels that both require fewer gates and yield more accurate regression models.

Figure 5
Figure 5 illustrates the improvement of the regression model of the global PES for H 3 O + based on quantum kernels with optimized architecture with increasing complexity of the quantum circuits.It can be observed that the PES model with U including only the Hadamard and R Y gates is completely unphysical, while the model based on U with 6 layers in U e produces a smooth PES with a small RMSE.It can be seen that the circuit optimization rapidly reaches convergence, indicating that physical PES models can be obtained with shallow quantum circuits.This rapid convergence to an optimized circuit architecture illustrates the efficiency of the kernel optimization algorithm, highlighting its ability to identify important entanglement patterns and circuit structures that significantly enhance the interpolation accuracy of the QGP models.The lowest value of the RMSE in Figure 5 is 82 cm −1 .Note that the models of the global 6D PES illustrated in Figure 5 are obtained with 500 potential energy points.The error of ML models scales with the number of training points N as ∝ 1/ √ N .

FIG. 5 .
FIG. 5. Improvement of the regression model of PES for H 3 O + based on quantum kernels with optimized architecture with the increasing complexity of the quantum circuits.The six-dimensional models of PES are trained by 500 potential energy points.The insets show the potential energy of the molecule as a function of the shortest H-O distance and the distance between two oxygen atoms.
31,98-100 and can potentially be exploited for quantum control 101 .The present algorithm can be used for applications in Refs.31, 98-101 and the algorithms for circuit growth in Refs.31, 98-101 can be adapted for quantum regression problems.