Predicting toxicity by quantum machine learning

In recent years, parameterized quantum circuits have been regarded as machine learning models within the framework of the hybrid quantum–classical approach. Quantum machine learning (QML) has been applied to binary classification problems and unsupervised learning. However, practical quantum application to nonlinear regression tasks has received considerably less attention. Here, we develop QML models designed for predicting the toxicity of 221 phenols on the basis of quantitative structure activity relationship. The results suggest that our data encoding enhanced by quantum entanglement provided more expressive power than the previous ones, implying that quantum correlation could be beneficial for the feature map representation of classical data. Our QML models performed significantly better than the multiple linear regression method. Furthermore, our simulations indicate that the QML models were comparable to those obtained using radial basis function networks, while improving the generalization performance. The present study implies that QML could be an alternative approach for nonlinear regression tasks such as cheminformatics.


INTRODUCTION
Quantitative structure activity relationship (QSAR) is one of major computational molecular modeling methods.The QSAR approach attempts to correlate molecular descriptors of compounds with their physicochemical properties; over the past decades, it has been used for predicting toxicity and bioactivities as well as finding new drug leads in chemical and pharmaceutical areas [1][2][3][4].Nowadays, owing to rapid development of information and communication technologies, huge amounts of physicochemical data coming from a variety of resources have been accumulated.Currently, databases containing millions of chemical compounds and their activities against biological assays are available on various platforms.As a consequence, there is a growing need for innovation in computer technology that can efficiently and accurately analyze ever-increasing amounts of physicochemical and biological data [5].
In the last years, quantum computing [6][7][8] has attracted much attention because it is one of the most promising quantum technologies that could radically transform science and many areas of industry.Although large-scale, fault-tolerant quantum computers have not yet been invented, noisy intermediate-scale quantum (NISQ) computers [6] have been applied to various areas of science and technology: chemistry [9][10][11][12][13], optimization [14][15][16], and finance [17,18], to name but a few.A promising scheme for practical applications on NISQ devices is the hybrid quantum-classical algorithm [6,8], in which computational tasks are deliberately divided into quantum and classical resources using a parameterized approach.
In particular, parameterized quantum circuits (PQCs) have been considered as machine learning models with high expressive power within the hybrid quantum-classical framework [32,33].PQCs are typically composed of fixed quantum gates (e.g., qubit rotations and entangling gates) in a shallow circuit layout, with variable parameters optimized in a classical feedback loop.So far, QML has been successfully applied to both discriminative [34][35][36] and generative [37,38] models.Examples of these include binary classification problems for image recognition [35], kernel methods for support vector machine [39,40], and unsupervised machine learning in finance [41].To our knowledge, however, the application of QML to regression tasks has not been fully investigated in the literature.It remains unclear what kinds of quantum states should be used in order to generate the feature map with high expressibility that is suited for real-world data sets.
To explore the possibility of near-term quantum applications to regression tasks, here we apply the QML method to quantitative structure-toxicity relationship (QSTR) models for predicting the toxicity of 221 phenols.While there are a variety of QSAR/QSTR models (e.g., 3D-QSAR [1,4]), as a first step we employ QSAR/QSTR models including molecular descriptors such as hydrophobicity, acidity constant, and frontier orbital energies.
There have been quantum computations in biochemical and pharmaceutical areas, such as protein folding [42][43][44], molecular similarity [45], and biological data [46]; yet, there has been no study on quantum application to QSAR modeling, albeit an important part of ligandbased computer-aided drug design.
The remainder of the paper is organized as follows.In Methods, we briefly review PQC-based machine learning and then describe our QML models in full detail.The information about the data set used for the QSTR modeling is also provided.In Results and Discussion, we present the results of our QSAR models and discuss how different encodings, variational circuit architectures, and the number of qubits affect the performance of the QML models.In addition, we compare the performance of our best QML models with those obtained by conventional chemometrics methods and comment on several perspectives on QML.Then, we summarize our conclusions.

Parameterized quantum circuits
In recent years, PQCs have been regarded as machine learning models with high expressive power within the framework of the hybrid quantum-classical approach.PQCs are usually composed of one-qubit rotations and two-qubit entangling operations in a shallow circuit layout, with parameters optimized in a feedback loop.A recent review on PQCs can be found in the literature [33].Combining near-term quantum algorithms and machine learning, QML using the framework of PQCs is sometimes referred to as quantum circuit learning (QCL) [32].So far, QML has been applied to both discriminative and generative tasks [34][35][36][37][38][39][40][41]; on the other hand, the application of QML to regression tasks has not been thoroughly investigated.
From the viewpoint of the machine learning architecture, PQCs consist of three components: the encoder circuit, the variational circuit, and the measurement for the estimation of the loss function.First, an encoder circuit loads classical d-dimensional data  = ( & ,  ( , … ,  * ) , ∈ ℝ * into a higher-dimensional feature map  0() in the Hilbert space, which produces a quantum state  0() |0⟩ ⊗5 , with  being the number of qubits.The number of qubits  can be set to the dimension of input data  (other situations are also considered in the present work).Such approach may be less efficient in terms of the number of qubits but is efficient in terms of circuit depth.Second, a variational circuit () acts on the quantum state prepared by the encoder circuit, in order to explore the quantum-enhanced feature space using trainable parameters , leading to the parameterized quantum state () 0() |0⟩ ⊗5 .Third, the loss function can be estimated from the expectation value by measurements.In the following subsections, we will closely look at each step of our QML models.

Encoder circuit
Data representation is essential for the success of machine learning models.In QML, loading classical data as a quantum state is an important and challenging task; in fact, the choice of encoding in PQCs is analogous to selecting a feature map in kernel-based machine learning techniques [33,40].Several methods for encoding input data into qubits have been proposed: angle encoding, amplitude encoding, and a random linear map [33-35, 40, 47].However, it is not a priori obvious what kind of encoding is suitable for our particular application.With this in mind, we employ three methods of loading classical data into a quantum state (note that we can pre-process input data by means of normalization).
A first encoding is the one proposed by Mitarai et al. [32]: This approach was originally motivated by expanding the density operator of a quantum state in terms of a set of Pauli operators [32].A second encoding we consider is an angle encoding [34,47] and the corresponding unitary operator  ( can be defined by This scheme is sometimes referred to as qubit encoding [48].The encoding can be viewed as the product of local kernels, where each component of the input vector is encoded into a local feature map; it has the same structure as a product quantum state that is unentangled [33].
This kind of encoding, though seemingly simple, has been applied for tree tensor network classifiers in QML [34].A third encoding is related to the second one and uses a couple of single-qubit rotations.The corresponding unitary operator can be expressed as This encoding loads each component of the input vector into two angles in the Bloch sphere, generating a certain redundancy in encoding and hence the possible modification in the feature map.
In addition to investigating different ways of encoding, we explore the possibility that entanglement might extend the flexibility in data representation.In fact, the previous studies suggest that entangling gates play essential roles in quantum generative models [38,49] and in expressibility for PQCs [50]; in particular, repeated circuit layers with entangling controlled NOT (CNOT) gates provide high expressive power [49,50].In this work, we propose an encoder circuit containing entangler blocks in data representation.Such encoding circuit can be expressed as multiple layers of single-qubit rotations followed by two-qubit entangling gates: Here, the th layer of the operations comprises a product of two operations: (i) the unitary operator  N O () that is any reasonable encoder circuit loading classical input data  and (ii) the two-qubit entangling operation  JKL M that is typically composed of CNOT or controlled Z (CZ) gates (which are hereafter denoted as  RST, and  RU , respectively).In the following, the encoding described in Eq. ( 4) is referred to as entangler-enhanced encoding.We could expect that such encoding might expand the representation ability in the feature map, owing to quantum entanglement.From the viewpoint of quantum physics, the above encoding can be interpreted as a concatenated tensor network and this family of quantum circuits can describe a high dimensional tensor network in an efficient way [51].In the present study, we consider the following encoding composed of two layers: where the unitary operators  & and  ( can be any of the three encodings mentioned earlier.
Our approach can be viewed as an extension of the previous QCL [32], where the feature map is represented by the product state.To investigate the performance of our entanglerenhanced encoders, we considered 10 combinations for  & ,  ( , and  JKL , which are summarized in Table 1. Another approach to increase the flexibility of the feature space is to use  copies of quantum states (i.e., the -times product) at the outset, which means that each component of the input data is encoded into multiple qubits [32,40].While this scheme requires additional quantum resources, it generates higher-order terms in the feature map, which is likely to give rise to more expressive power and a richer class of functions.A recent study indicates that such input redundancy is necessary for the task of data fitting and that it grows at least logarithmically in the complexity of the functions [52].For each encoding in Table 1, we thus consider feature maps using two and three copies of the quantum states in encoding.

Variational circuit
The essential role of the variational circuit () is to explore efficiently the quantumenhanced feature space generated by PQCs.The variational circuit originally reported in the literature is based on the time evolution of Ising Hamiltonian [32]; it uses the Trotter decomposition method, which requires an additional computational cost.Another disadvantage of the method is that it is rather memory-intensive when performing quantum simulator on classical processors.
To circumvent the limitations, we employed quantum circuits inspired by the strategy of the hardware-heuristic ansatz [13], which was originally motivated by the limitations of existing NISQ devices in fidelity and connectivity.On the basis of the architecture of PQCs [13,50], here we propose that the variational circuit be constructed by L layers of the unit circuit consisting of single-qubit rotations  ℓ ( ℓ ) and two-qubit entangler blocks  JKL ℓ comprising CNOT or CZ gates: From a physical standpoint, such quantum circuit can be interpreted as a concatenated tensor network, which can be used for an efficient description of time-evolved quantum states [51].
In this work, we investigated the performance of the three variational circuits (see Figure 1): one was the variational circuit based on the time evolution of Ising Hamiltonian and the other two were the modified variational circuits based on the hardware-heuristic approach (the total number of two-qubit gates is  ).In both approaches, the total number of trainable parameters is 3.is  in the cases of (b) and (c).In QML, the measurement of quantum states extracts the information that can be used for supervised learning.For instance, a QML architecture can measure an expectation value by acting a Pauli Z operator on a single qubit (Figure 2a).This expectation value can be used for the evaluation of the loss function.Since the information is reduced to only one qubit and then extracted by the measurement, this approach may be considered as pure QML (unless otherwise mentioned, this scheme was employed in this work).For the values  = ( (&) ,  (() , … ,  (o) ) , ∈ ℝ o (where  is the number of data samples) and the expectation values { (s) }, the loss function ℒ can be given by ℒvw (s) x, { (s) }y = 1  {v (s) −  (s) y ( o sF& (7) Another approach is to use a set of expectation values from multiple qubits [33] for the evaluation of the loss function.This scheme can be viewed as hybrid quantum-classical machine learning and such quantum circuits are also considered in this work (Figure 2b).As a first step, we simply use the expectation values as input for a multiple linear model.For a set of expectation values from  qubits for th data, { = (s) } =F& • , the predicted value  € (s) can be expressed as
In our regression tasks, we used a standard approach that minimizes the loss function with respect to trainable parameters .In the present work, a regularization term was not included, since overfitting would be effectively avoided owing to the inherent constraints arising from the unitary conditions [32].In minimizing the loss function, we used the Nelder- Mead method [53], which is a gradient-free algorithm.In our QML models, the scaling factor for observable quantities from the measurements  〈>〉 is a hyperparameter.Thus, we systematically varied the hyperparameter in our simulations.Values for the scaling factor for the expectation value 〈〉 were chosen between 1.0 and 10.0, depending on models.We used mean squared error (MSE) to evaluate the error of a prediction.The coefficients of determination ( ( ) were calculated for evaluating the performance of our QSTR models.

Implementation
We implemented our QML models using Qulacs

Data set
In our QSTR models, we used a data set of 221 phenols, for which toxicity data to the ciliate Tetrahymena pyriformis in terms of log(1/IC --) are available [59].We used the following molecular descriptors: hydrophobicity ( log  j™ ), acidity constant ( p › ), frontier orbital energies ( oeT:T and  •ž:T ), and hydrogen bond donor/acceptor counts ( ŸkjK ).The data set has been used for evaluating the performance and predictive abilities of standard chemometrics methods [59][60][61]: multiple linear regression (MLR), support vector machine, and radial basis function neural networks (RBF-NNs).To compare our QML models with conventional chemometrics methods, we trained MLR and RBF-NN models on the same data set.Following the previous QSTR study [61], we used the hold-out validation for our QSTR models; more specifically, we used 180 compounds for a training set and 41 for a validation set.Note that the data splitting we used for the hold-out validation was exactly the same as in the previous work [61], in which the Kennard-Stone algorithm [62] was employed for generating the data splitting in order to make all the validation data fall inside the training data.Such data splitting is useful because the data set in the QSTR study is somewhat widely distributed and contains certain outliers.We also performed 5-fold cross-validation on the entire data set that had been randomly sorted.

Simulation details
All of the simulations for QML, MRL, and RBF-NN models were performed on a classical computational platform, powered by Intel Xeon Gold 6154 processors with 192 GB memory.
All the simulations except for the QML models with 15 qubits were performed using a single CPU core; and the QML simulations with 15 qubits were performed by OpenMP parallel jobs using 9 CPU cores.

RESULTS AND DISCUSSION
3.1 Encoder circuit   ) using 13 different encoder circuits with 5, 10, and 15 qubits.Note that the cases for 10 and 15 qubits correspond to two-and three-times products of the quantum state, respectively.For the definitions of the encoder circuits, see Table 1.
To begin with, we compared the performance of the three conventional encodings with 5 qubits (Table 2).As to the performance of  L ›¡K ( , the A1 and A2 encoders (0.777 and 0.735) performed better by 15% than the M encoder (0.658).The results indicate that the angle encodings provide more flexibility in data encoding owing to its simplicity and high nonlinearity.To improve the performance of our QML models, we then explored the possibility that entanglement might extend expressive power in data representation.It has been shown that entangling gates play essential role in quantum generative models [38,49] and in expressibility for PQCs [50]; nonetheless, the previous encoder circuits have not contained quantum entanglement.
We employed the encoder circuits having CNOT or CZ gates (Table 2).As to the performance of  L ›¡K ( , our entangler-enhanced encodings containing  : performed better by 15% than the original  : unitary.In the case of the angle encodings, the encoders containing entangling gates outperformed those without entanglement by 7%.In particular, the A2-A2-CNOT encoder provided the best performance (0.842), followed by that obtained by the A1-A1-CZ encoder (0.822).This result is consistent with the previous studies on PQCs, in which repeated circuit layers with entangling gates provide high expressive power [50].Our results indicate that the feature map using the product state was inadequate for our application in terms of expressibility and that the entangler-enhanced encodings provided more expressive power in data representation with the aid of quantum entanglement.This implies that quantum correlation could be advantageous for the feature map representation of classical input data.
To comprehend the roles of the redundancy in encoding associated with higherdimensional local feature maps, we then increased the number of qubits in our QML models.
In this scheme, each component of the input data is encoded into multiple qubits.Recently, Vidal and Theis investigated whether the redundancy in PQCs is useful for the task of data fitting [52]; and their study indicates that lower bounds of the redundancy are logarithmic in terms of the complexity of the functions.Since five molecular descriptors were contained in our QSTR models, we used 10 and 15 qubits, which corresponds to two and three qubits per input data, respectively.
The use of 10 qubits (two copies of the quantum states) had a positive impact on the performance of  L ›¡K ( for all the encodings examined (Figure 3), leading to an 11% increase on average.The results indicate that, in our QML models with 10 qubits, higher dimensionality was effectively taken into account owing to the redundancy in multiple-qubit encoding.By encoding each component of the input data into the higher-dimensional local feature map, the encoder is composed of a more complete basis of functions and can respond to smaller changes in the input data [48].In line with the results with 5 qubits, the A2-A2-CNOT encoder provided the best performance in  L ›¡K ( (0.906).(Table 3).This confirms that our entangler-enhanced encodings provided more flexibility in data representation.The encoders containing CNOT gates had the tendency to perform better than those containing CZ gates.This expressive power might be related to the fact that increasing CNOT gates in multilayer PQCs leads to an increase in the bond dimensions in tensor networks [49].On the other hand, the coefficients of determination obtained with 15 qubits (three copies) did not improve the performance (see Figure 3 and Table A1).A computational aspect is that increasing the number of qubits causes an increase in the number of trainable parameters in (), which could result in the slower convergence in minimizing the cost functions.In this particular application, our QML models using the entangler-enhanced encoding with 10 qubits provided better performance than those with 5 or 15 qubits.Note that the encoders with 15 qubits using only the product state underperformed several encoders with 10 qubits using quantum entanglement (Figure 3).

Variational circuit
To understand how the architecture of the variational circuit affects the performance and the computational cost, we used the three variational circuits while using the same encoding circuit (the M encoder).The first variational circuit was the one based on Ising Hamiltonian, which was previously proposed [32].The second and third circuits were CNOT-based and CZ-based variational circuits, respectively.The latter two circuits are motivated by the strategy of hardware heuristic ansatz in order to circumvent the limitations of quantum hardware; the two circuits can avoid an additional computational cost generated by the Trotter decomposition.
According to our numerical tests on simple regression tasks, we found that CNOTbased variational circuit provided a similar performance compared with the variational circuit based on Ising Hamiltonian, whereas CZ-based variational circuit gave an inferior performance (not shown).The results indicate that repeated circuit layers with entangling CNOT gates provide high expressive power, in line with the previous studies, where CNOT gates play important roles in expressibility of PQCs [49,50].Therefore, we employed the variational circuit containing entangling CNOT gates, unless otherwise mentioned.In addition, we observed the substantial computational speedup by using the variational circuit containing entangling gates, compared with the original variational circuit based on Ising Hamiltonian.A major disadvantage of the latter is that the computational cost and the memory requirement for the calculation of the Trotter operator matrix grows exponentially with respect to the number of qubits, for quantum simulator on classical processors (Table B1).For that reason, we recommend the use of the hardware-heuristic variational circuits.Table 1.
Furthermore, we checked the effects of the number of unit layers  on the performance of our QML models.According to our simulations (3 ≤  ≤ 15), adding unit layers normally provided good results; a typical example of this tendency can be found in Figure 4a, in which  L ›¡K ( obtained using the A2-A2-CNOT encoder is gradually improved as a function of  (as to the case of 5 qubits, the performance appears to be saturated for  ≥ 12).This is consistent with the decrease in MSE for the training set by increasing  (Figure 4b).The results imply an improved efficiency in exploring the solution space by adding circuit unit layers, in agreement with the previous studies on PQCs [49,50].We also found that the optimized numbers of layers in our QML models was significantly dependent on the choice of the PQC architecture and of the encoding (see also Tables 2 and 3).A similar tendency has been reported in the previous work on the expressibility and entangling capability of PQCs, in which the rates of change in expressibility with respect to the number of layers tend to vary from circuit to circuit [50].

Final QML model
, which is a multilayer PQC.
Considering the results presented in the previous subsections, we obtained our final QML model suitable for our particular application (depicted in Figure 5).Our final model can be described as follows.The quantum circuit for data representation is given by the entanglerenhanced encoder  RST,  (  RST,  & ; in our best QML model,  & =  ( =  G( (i.e., the A2-A2-CNOT encoder) (Figure 5a).Hence, the feature map can be given by |Ψ⟩ =  RST,  G(  RST,  G( |0⟩ ⊗5 .This kind of encoder can be viewed as a 2D tensor network, in which the entangler block can be interpreted as the periodic boundary condition (Figure 5b).
Each component of the input data is encoded into two qubits, meaning that the feature map with higher dimensionality can be taken into account; consequently, 10 qubits are used for encoding because five molecular descriptors are contained in our QSTR model.The variational circuit () is given by a multilayer PQC: () = ∏  ℓ ( ℓ ) RST, ℓ _ ℓF& .

Measurements and the hybrid approach
We compared the performance between pure and hybrid QML models for the A2-A2-CNOT encoder with 10 qubits (Figure C1).Overall, the values for  L ›¡K ( were improved by about 2% when using the hybrid QML approach, in which the expectation values from  qubits were fed into the evaluation of the loss function.However, increasing the number of qubits for the measurements  did not necessarily lead to incremental improvements in the  L ›¡K ( performance.Rather, we found that the number of unit layers  in () had an overall impact on the  L ›¡K ( performance.Also, there were quite a few cases where the performance on the validation set were not improved, compared with those obtained by pure QML models (this topic will be discussed in the next subsection).On the other hand, we found that the QML model with  = 4 and  = 10 provided the best performance for  ¢›£ ( (0.886).
Further improvement for the post-processing on the classical part (e.g., classical neural networks) may be necessary.Having developed our QML models for QSTR application, we now compare their performance with those obtained by conventional chemometrics methods, namely MLR and

Perspectives on QML
Let us comment on several perspectives on QML models.While definitive quantum advantage for machine learning has been controversial, we anticipate that there may be several merits for employing QML.First, we can directly manipulate the feature map in terms of quantum many-body states.If one could use complex, computationally intractable quantum states as feature maps while avoiding overfitting, then that could be an advantage.Second, once the architecture of PQCs is designed, it can train QML models in an efficient way, without the need for further tuning.In particular, the unitary conditions inherent to quantum circuits can act as built-in regularization, which may result in the avoidance of overfitted models and the improvement of generalization performance.In the case of RBF-NNs, on the contrary, centers of the RBFs, the number of hidden layer units, widths, and weights have to be determined carefully.Third, QML models using PQCs require much less number of trainable parameters and fewer hyperparameters, implying the possibility of efficient and unbiased machine learning using near-term quantum computing.Fourth, on numerical simulators, the interpretation of QML models could be possible by analyzing the information about unitary operations and wavefunctions.Fifth, there is a close relationship between quantum circuits and tensor networks, which may be advantageous for the development of QML in the framework of tensor networks [64].Considering all this, it is desirable to investigate the performance of QML on a variety of practical applications using real-world data sets, such as cheminformatics, materials informatics, and other practical machine learning tasks.

CONCLUSIONS
In the present work, we have developed our QML models designed for predicting the toxicity of 221 phenols (QSAR/QSTR modeling), using the framework of the quantum-classical hybrid algorithm.To our knowledge, this is the first practical application of QML for a nonlinear regression task using a real-world data set.
In our particular application, angle encoding was found to be useful in terms of flexibility in data representation owing to its simplicity and high nonlinearity.Furthermore, the results suggest that our entangler-enhanced encodings provided more expressive power in data representation than the previous ones, implying that quantum correlation could be useful for the feature map representation of classical data.Doubling the number of qubits had a positive impact on the performance of  L ›¡K ( (an 11% increase), with the aid of the higher dimensionality in the feature map.Repeated circuit layers with CNOT blocks in the variational circuit provided a computational speedup compared with the original variational circuit based on the time evolution of Ising Hamiltonian.

Figure 2 :
Figure 2: Measurement of (a) an expectation value from a single qubit or (b) a set of expectation values from multiple qubits.
[54], a Python/C++ library for quantum circuit simulation.The time evolution gate of Ising Hamiltonian needed for the original QCL model was implemented using NumPy[55] and SciPy[56] libraries.The Nelder-Mead optimization of Pauli rotation angles was implemented using scipy.optimizemodule in SciPy library.The k-fold cross-validation was implemented using KFold module in scikit-learn [57] library.Pre-and post-processing of the data set was implemented using pandas [58] library in combination with NumPy and SciPy libraries.

Figure 3 :
Figure 3 : Coefficients of determination for the training set ( L ›¡K (

Figure 5 :
Figure 5 : Quantum circuit (a) and the graphical tensor network representation (b) for our QML model suitable for our QSTR study.The feature map is given by |Ψ⟩ =  RST,  (  RST,  & |0⟩ ⊗5 (in our best QML model,  & =  ( =  G( ).From a physical standpoint, such quantum circuit can be interpreted as a 2D tensor network, in which the entangler block can be interpreted as the periodic boundary condition.Each component of the input data is encoded into two qubits (i.e.,  = 10 qubits) in order to increase dimensionality in the feature map.The variational circuit is given by

TABLE 1
Encoder circuits investigated in the present work ( JKL  (  JKL  & ) and the corresponding ID.
Note that the first three encodings are conventional encoders (i.e.,  JKL and  ( are replaced by the identity operator) whereas the remaining 10 encoders contain entangler blocks  JKL .For the definitions of the unitary operations, see the text.

TABLE 2
and the scaling factor  〈>〉 for the expectation value 〈〉.

TABLE 3
Coefficients of determination for the training and the validation sets ( L ›¡K ( and  ¢›£ ( ) using 13 different encoder circuits with 10 qubits (two copies of the quantum states), the optimized number of layers in the variational circuit  (3 ≤  ≤ 12) and the scaling factor  〈>〉 for the expectation value 〈〉.

TABLE 4
Performance comparison of our QML models with those obtained from conventional chemometrics methods (the coefficients of determination, MSE, and root mean square (RMS) for the training and the validation sets).