Universal neural network potentials as descriptors: Towards scalable chemical property prediction using quantum and classical computers

Accurate prediction of diverse chemical properties is crucial for advancing molecular design and materials discovery. Here we present a versatile approach that uses the intermediate information of a universal neural network potential as a general-purpose descriptor for chemical property prediction. Our method is based on the insight that by training a sophisticated neural network architecture for universal force fields, it learns transferable representations of atomic environments. We show that transfer learning with graph neural network potentials such as M3GNet and MACE achieves accuracy comparable to state-of-the-art methods for predicting the NMR chemical shifts of using quantum machine learning as well as a standard classical regression model, despite the compactness of its descriptors. In particular, the MACE descriptor demonstrates the highest accuracy to date on the ${^{13}}$C NMR chemical shift benchmarks for drug molecules. This work provides an efficient way to accurately predict properties, potentially accelerating the discovery of new molecules and materials.


Introduction
As evidenced by the enumeration of 166.4B possible organic molecules containing up to 17 heavy elements, such as C, N, O, S, and halogens (excluding hydrogen), the expansion of the chemical space is astronomical with the increase in types and numbers of elements. 1,231][32][33][34][35][36][37][38][41][42][43] Notably, IAPs built using descriptors and Gaussian process regression (GPR) 14 have been termed Gaussian approximation potentials (GAP) and have found success in the exploration of the chemical space of molecules and materials. 14,21,37Both kernel ridge regression (KRR) and GPR have been employed to improve the accuracy of NMR chemical shift prediction 29,30,[41][42][43][44] .However, the dimensionality of the descriptors becomes a barrier to generalization and high accuracy as the molecular or material composition becomes more diverse owing to the addition of different types of elements. 19,35,39,45,25,[45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61][62][63][64] In most GNNbased IAPs, atoms within a molecular or material environment are represented as nodes, and their local connectivity as edges in a graph.The graph is then convolved to embed atom-specific information within each node, and further processed using multilayer perceptrons (MLP) to predict target observables.In molecular and materials simulation and modeling, the consideration of symmetry is extremely important.It is desirable for GNNs to be invariant or equivariant to symmetry operations such as translation, rotation, and reflection for the models to make physically meaningful predictions.GNNs that possess these properties are referred to as invariant GNNs or equivariant GNNs.The universal GNN-based IAPs proposed thus far have been designed to satisfy these symmetries.Recently, E (3) or SE (3) equivariant GNNbased IAPs (e.g., Allegro 61 , GNoME 65 , MACE [62][63][64] ) have demonstrated superior performance compared to E(3) invariant GNNbased IAPs (e.g., MEGNet 52 , M3GNet 10 ). 66,67milarly, GNN-based models have been developed to predict NMR chemical shifts. 46,47,49,50,59,68DFT-level calculations of NMR chemical shifts for 1 H and 13 C have demonstrated the ability to predict within a target accuracy range of 1-2% relative to the possible ranges of approximately 10 ppm and 200 ppm, respectively 69,70 .Therefore, the uncertainty in machine learning models using DFT-level datasets is this level of precision, with the target accuracy of 0.2 ppm for 1 H and 2 ppm for 13 C. 30 For example, Yanfei Guan et al. achieved the target accuracy of 0.16 ppm for 1 H and 1.26 ppm for 13 C by training the SchNet architecture 51 on molecular NMR chemical shifts (CASCADE) 46 .
However, the scalability remains an issue due to the increasing optimization costs of GNN and MLP parameters when the size of datasets increase.Han et al. addressed this issue by constraining the nodes in a GNN to heavy elements only, thereby rendering the construction of scalable GNN-based NMR chemical shift models feasible while achieving a state-of-the-art prediction accuracy comparable to that of CASCADE. 682][73][74][75][76][77] Consequently, efforts are being made to develop machine learning models for NMR chemical shifts of nuclei such as 15 N, 17 O, and 19 F. [73][74][75][76] These elements exhibit wide chemical shift ranges, with about 600, 2500, 500 ppm for 15 N, 17 O, and 19 F, respectively.The target accuracy for these nuclei is set at 25 ppm for 15 N and 5 ppm for 19 F as well as 1 H and 13 C. [71][72][73][74][75][76][77] Notably, both descriptor-based and GNN-based methods face challenges.The former faces increased learning costs as the composition becomes more complex, and the latter faces increasing parameter optimization costs with larger training datasets.To address these issues simultaneously, we focused on the potential utility of the outputs from pre-trained GNN-based IAPs as descriptors.We considered these outputs GNN transfer learning (GNN-TL) descriptors and built machine-learning models for predicting chemical properties.[80][81] The remainder of this paper is organized as follows.Section 2 details the GNN-TL descriptor and the kernel method, implemented on both classical and quantum computers, for predicting NMR chemical shifts of 1 H, 13 C, 15 N, 17 O, and 19 F. Section 3 presents the performance of our developed machine learning models.Section 4 discusses the benefits and applications of the GNN-TL descriptor.Finally, Section 5 concludes the paper.

Method: Transfer Learning Using Pre-trained Graph Neural Network
In this section, we discuss the transfer learning of a pre-trained GNN-based IAP.This approach integrates the outputs from the GNN layer of the IAP as shown in Fig. 1.The architecture of a GNN-based IAP can be broadly segmented into a GNN layer and an MLP layer (gray area of Fig. 1).For the E(3) invariant GNNbased IAP, we opted for two backbones: a MEGNet pre-trained on the QM9 dataset 82 and a M3GNet trained on the MPF.2021.2.8 dataset, which encompasses compounds covering all 89 elements from the Materials Project. 10 The parameters of the GNN layer in the M3GNet IAP were optimized to predict system energy, forces, and stress tensors.Additionally, we incorporated the E(3) equivariant GNN-based IAPs, namely MACE [62][63][64] , into our study.We employed two types of pre-trained MACE IAPs: one trained on a larger dataset named MPtrj 83 from Materials Project, referred to as the MACE-MP0 model 64 , and another trained on an organic molecule dataset covering 10 types of elements including SPICE 84 and QMug 85 , termed the MACE-OFF23 model 63 .Each model has variations in parameter size, and for this study, we utilized the "small" and "large" versions. 63,64en fed with the atomic coordinates of a molecule with N atoms, denoted by {Z i , R i }, where Z i represents the atomic number indicating the type of each atom, and R i is the threedimensional position vector of the ith atom, the GNN layer generates a set of vectors, {G i }, which mirrors the environment of the ith atom in the molecule.This is referred to as the GNN-TL descriptor.The GNN layer for both MEGNet and M3GNet outputs GNN-TL descriptors with dimensions of 32 and 64 per atom, respectively.On the other hand, MACE is a GNN architecture that predicts energy in the form of atomic cluster expansion.As in Ref. 86 , only the output of the 1st layer of the GNN layer, corresponding to the one-body term of the many-body expansion, is used as the GNN-TL descriptor.The dimensions of this GNN-TL descriptor are 128, 256, 96, and 224 per atom for MACE-MP0small, MACE-MP0-large, MACE-OFF23-small, and MACE-OFF23large, respectively.
Using GNN-TL descriptors as input, a regression model was constructed to predict NMR chemical shielding constants.For the regressor, one can choose methodologies, such as GPR, KRR, or feed-forward neural network (NNs), which are contingent on the specific task.In this study, to ensure a maximally fair comparison with other descriptor-based techniques, we adopted KRR.KRR combines the merits of ridge regression, which offers regularization to mitigate overfitting, with the kernel method, facilitating nonlinear regression.In kernel methods, the data -in the context of our study, the GNN-TL descriptors-are mapped into a high-dimensional feature space through a non-linear kernel function.The Laplacian and Gaussian kernels were applied: where γ is the hyperparameter of the kernel and p is the norm parameter that differentiates the type of kernel: p = 1 for the Laplacian kernel and p = 2 for the Gaussian kernel.In KRR, the predicted value σt for the target chemical property of the target atom is derived from the GNN-TL descriptor G t as follows:   Here, α i represents the i th element of the regression coefficient vector, α, of size N.The regression coefficients are determined by solving a ridge-regularized least-squares problem, which can be reduced to: where I denotes the identity matrix, σ denotes the chemical properties of each N training data samples, and λ denotes the regularization parameter.The matrix K, is a kernel matrix, with elements given by k G i , G j .
All computations related to the KRR were executed using Scikitlearn v.1.2.2, 87 and the hyperparameters of each model were tuned using Optuna v.2.10. 88For dataset sizes of up to 50K items, we conducted hyperparameter optimization for 100 iterations with ten-fold cross-validation, while for those at 100K, we limited the optimization to 10 iterations.
The quantum-kernel method leverages quantum computers to compute kernels, 16,89,90 which is achieved by embedding feature vectors generated by classical computers into quantum states.This method calculates the inner product of these quantum states to derive the desired kernels.Embedding feature vectors into quantum states corresponds to mapping them onto a Hilbert space with dimensions raised to the power of two quantum bits (qubits).Using the kernel matrix constructed on a quantum computer, we performed a KRR, denoted as quantum KRR (QKRR).
In this study, we adopted the natural parameterized quantum circuits (NPQC) Kernel, which has been demonstrated to possess performance characteristics similar to the Gaussian kernel, both theoretically and in actual hardware experiments [91][92][93] .All computations were conducted using Scikit-qulacs 87,94,95 .The quantum kernel was constructed in a 10-qubit space.Hyperparameters for the quantum kernel were determined through grid search.The determined parameters of NPQC kernel were c = 1.5 and the repetition times of embedding 40.The regularization hyperparameter in QKRR was determined using 10 iterations of randomized search.

Results
In Section 3.1, because we deal with many elements, we compared the dimensional efficiency of our proposed GNN-TL descriptor to well-established physics-inspired descriptors.Note that the GNN-TL descriptor can better handle complex chemical systems by exploiting the GNN-based IAP architecture.
In Section 3.2, we focused on the accuracy of the GNN-TL descriptor in predicting NMR chemical shifts, which are key to understanding molecular details (e.g., interatomic distances and bond angles).This scenario provides an ideal test for determining how well the GNN-TL descriptor works in our study.
Our analysis began by comparing quantum kernel learning, in which the kernels are tested using a quantum computer emulator with traditional kernel learning methods.We then checked the accuracy of the GNN-TL descriptors across the different pretrained GNN models.
Finally, we juxtaposed our GNN-TL descriptor using wellestablished physics-inspired descriptors.This comparison demonstrates the superiority of the proposed descriptor in terms of efficiency and accuracy.Furthermore, it highlights its potential for accurately predicting chemical properties, which is crucial for advancing research in the molecular and material sciences.

Dimensional Efficiency
At the atomic level, descriptors are tools designed to encode information about atoms within molecules or crystalline materials into vectors.Popular descriptors, such as SOAP and FCHL18, excel at intricately capturing the environment within an atom's cutoff radius.Although these descriptors have achieved significant success in various accuracy benchmarks, they also present challenges due to their large dimensions.Various strategies have been developed to address these challenges, 34,[97][98][99] including refining the descriptor itself, using principal component analysis for dimension- † SOAP and FCHL were generated by Dscribe 0.4.0 28 and QML 0.4.0.12 96 , respectively.for NN potential to discretize FCHL18 33 to derive a compact and accurate FCHL19. 34 Table 1, we present the scaling of the SOAP, FCHL19 and various GNN-TL descriptors in response to an increase in the number of elemental species considered.Additionally, for the QM9, QMugs, 85 and MPF.2021.8 or MPtrj datasets, 10 the descriptor dimensions corresponding to 5, 10, and 89 elemental species comprising each dataset are summarized, respectively.Remarkably, with an increase in the number of element types, both SOAP and FCHL19 exhibited quadratic scaling.As a snapshot, when representing five elements in the QM9 dataset, the SOAP and FCHL19 methods have dimensions of 5,740 and 740, respectively.This dimensional disparity increases with the number of elemental types.Hence, to represent the 89 elements, the dimensions increased to 1,737,120 and 162,336, respectively.These dimensions are hundreds to tens of thousands of times larger than the compact GNN-TL descriptors, which ranges from 64 to 256 dimensions.Owing to its consistent dimensionality, irrespective of the increase in elements, the GNN-TL descriptors are overwhelmingly compact.

Prediction Accuracy: NMR Chemical Shifts
The NMR chemical shifts, δ , were predicted using the chemical shielding constant of the reference substance, σ ref , as the baseline.The NMR chemical shift was calculated using the following equation: 2][103][104] Specifically, tetramethylsilane was selected for both 1 H and 13 C, nitromethane (MeNO 2 ) for 15  In our study, we utilized the QM9NMR dataset, which contains approximately 134K small organic molecules containing C, N, O, and F (excluding H), with each molecule having no more than nine atoms. 29,82This dataset provides the detailed NMR chemical shielding constants for these molecules.To analyze how the model accuracy changes with training data size, we adopted an approach similar to that used in the original publication of the QM9NMR dataset. 29Specifically, for 13 C, of a total of 831K data points, we randomly withheld 50K data points to build our test set.Subsequently, from the remaining 13 C NMR chemical shifts, we randomly selected subsets containing 100, 200, 500, 1K, 2K, 5K, 10K, 50K, 100K, and 200K data points to create various training sets.For the other isotopes (i.e., 1 H, 15 N, 17 O, and 19 F), the test sets were similarly established by withholding 50K, 30K, 50K, and 1K data points, respectively.The training size for 19 F was set to 2K, whereas the other isotopes were trained on datasets of 100K data points.In addition to the QM9 NMR dataset, we sought to validate the performance of our model on external datasets.Hence, we employed the two sets of molecules provided in another study; 29 one consisting of 40 drug molecules from the GDB17 universe and another containing 12 drugs with 17 or more heavy atoms.Fig. 2 shows the relationship between the mean absolute error (MAE) for the 13 C NMR shielding constant predictions and the training data size.Both QKRR and KRR demonstrated consistent improvements in predictive accuracy with an increase in training size.Notably, the quantum kernel exhibited a performance comparable to that of the Laplacian kernel.For a training size of 100K, the MAE for the 13 C predictions was 2.28 ppm.In a comparative study by Gupta et al., the KRR models using the Coulomb matrix (CM), 109 SOAP, and FCHL descriptors reported MAEs of approximately 4, 2.1, and 1.88 ppm, respectively, for the same training size. 29Compared with the CM descriptor, our GNN-TL descriptor showed significantly better predictive capabilities, achieving an MAE that was nearly half that of the CM descriptor.Although our method did not exceed the accuracy levels of SOAP and FCHL, the performance of the GNN-TL descriptor was competitive, highlighting its potential as a robust descriptor.
Next, we compared the performance of the GNN-TL descriptors derived from different IAP architectures.Recently, independent of our work, a predictive model for 13 C NMR chemical shielding was proposed using a pretrained IAP known as SchNet, which is a pioneering GNN used as a descriptor. 110This model was trained on 400 data points of 13 C NMR chemical shielding constants of the molecules in QM9 dataset, 82 with the SchNet GNN-TL descriptor as an input to a feed-forward NN for regression.The predictive accuracy of the SchNet/NN was a root mean-squared error (RMSE) of 12.8 ppm.In pursuit of a fair comparison with their model, we applied KRR using pre-trained MEGNet, M3GNet and MACE GNN-TL descriptors, setting our training data size to 400 data points of 13 C NMR chemical shielding constants.To account for the influence of random sampling, we created 10 different training sets, each comprising 400 data points.The effect of potential data bias was then quantified by calculating the mean RMSE and standard deviation (STD) for each model.Detailed verification including kernel function dependencies can be found in the Appendix.The results of this comparative study are summarized in Table 2.In Table 2, the results for KRR using the Gaussian kernel, which showed superior accuracy compared to the Laplacian kernel, are presented.
In contrast to the SchNet/NN model's RMSE of 12.8 ppm, the MEGNet/KRR model shows significantly lower predictive accuracy with an RMSE of 20.08 ± 0.55 ppm, suggesting that the MEGNet descriptor is less effective for 13 C NMR chemical shield-Table 2 The architecture dependence of the predictive performance.For KRR, the Gaussian kernel was applied.highlight the superior performance of the MACE descriptors, particularly MACE-OFF23-small, in enhancing the accuracy of KRR models for predicting 13 C NMR chemical shielding.A more detailed discussion of the nuances of these architectural differences is presented in Section 4.1.

GNN-TL
The accuracy of KRR models incorporating the M3GNet GNN-TL descriptor with a Laplacian kernel for NMR chemical shifts was evaluated for each test set of the five different nuclei.Table 3 lists the statistical performance metrics for predicting NMR chemical shifts.Across all elements, the MAE for the test set remained below 5 ppm.The MAE for 1 H and 19 F were notably low at 0.18 ppm and 2.65 ppm, respectively, indicating a high degree of prediction accuracy for these nuclei in the unseen molecular environments.The MAE for 17 O, although higher at 4.95 ppm, still reflects a reasonable predictive capability, given the complexity of the oxygen chemical shifts.The STD and interquartile range (IQR) values in the Table 3 represent the distribution of chemical shifts within the training data, rather than the accuracy of the model itself.Thus, the higher STD and IQR values for 17 O do not indicate a lack of model precision but rather the natural variability inherent in the 17 O chemical shifts within the training data.The MAE/STD ratio can still offer insights into model performance relative to data variability.For example, the relatively low ratio of 17 O (2.21%) suggests that the model predictions are consistent with the diversity of the training data.On the other hand, the higher ratios for 1 H (9.09%) and 19 F (7.78%) indicate that the accuracy of the models are not as high as desired, particularly when considering the range of chemical shifts represented in the training dataset.The maximum absolute error (MaxAE) for all nuclei is comparable to the STD of the training data.This is attributed to random sampling and is expected to improve with the application of more sophisticated data point sampling techniques, such as active learning.
Subsequently, these models were employed to predict the NMR chemical shifts of a single molecule C 5 H 5 N 2 OF containing five elements that was not included in the training data.The results are shown in Fig. 3.The MAE for each nucleus were found to be 0.08 ppm for 1 H, 1.03 ppm for 13 C, 6.45 ppm for 15 N, 2.86 ppm for 17 O, and 6.73 ppm for 19 F. The remarkably low MAE for 1 H and 13 C underscores the high accuracy of our model for these nuclei, with predictions that closely mirror the calculated values.The model performed well for the more challenging 15    We then expanded our assessment to evaluate the predictive ability of our model for molecules larger than those in the QM9 NMR dataset.As such, we incorporate the test sets provided in Ref. 29 , which comprised 40 drug molecules from the GDB17 universe and another set containing 12 drugs with 17 or more heavy atoms.See Ref. 29 for the structures of these molecules.
Table 4 presents the benchmark results for each test set using our M3GNet GNN-TL descriptor and MACE-OFF23-small GNN-TL descriptor.For comparison, we used the FCHL descriptor from Gupta's study. 29To ensure a fair comparison, we employed our GNN-TL descriptor models trained on a size of 100K 13 C chemical shielding constants.For both models, an increased molecular size in the dataset correlated with deterioration of the MAE value.Notably, although our M3GNet GNN-TL descriptor did not match the 1.88 ppm value achieved by the FCHL descriptor for the QM9 50K test set, our model exhibited an MAE value that was approximately 0.3 ppm lower for the 40 GDB17 dataset test.The MACE-OFF23-small GNN-TL descriptor showed even better performance, with an MAE of 1.87 ppm for the QM9 50K test set, closely matching the FCHL descriptor, and significantly outperforming it for the 40 GDB17 dataset with an MAE of 2.83 ppm.For the set of 12 drugs with 17 or more heavy atoms, the M3GNet descriptor showed an MAE of 4.21 ppm, while the MACE-OFF23-small descriptor showed an MAE of 3.85 ppm.Notably, the M3GNet descriptor's accuracy is comparable to the FCHL descriptor.The results were nearly identical for the set of 12 drugs with 17 or more heavy atoms, highlighting that the M3GNet GNN-TL descriptor was less affected by increasing molecular size.On the other hand, the MACE-OFF23-small descriptor significantly outperforms FCHL with an MAE of 3.85 ppm, highlighting its superior predictive performance.
For a detailed comparison, Fig. 6 illustrates the molecule- specific MAE values for both drug test sets.The molecular structures are provided in Ref. 29 .Our M3GNet and MACE-OFFsmall GNN-TL descriptor-based prediction models ensured that the highest MAE values for individual molecules across both test sets remained below 10 ppm.Intriguingly, the desflurane molecule, which posed the greatest challenge, showed MAE values of 53.3 ppm, 9.35 ppm and 8.31 ppm for the FCHL, M3GNet and MACE-OFF23-small GNN-TL descriptor models, respectively.This suggests an approximately 80% reduction in the MAE with our descriptor, which is likely attributable to differences in the encompassed descriptor domain.
The cutoff radius for the FCHL descriptor was determined through a grid search, 29 which settled at 4.0 Å.In this scenario, the two fluorine atoms in the terminal trifluoromethyl group (CF 3 ) of the desflurane molecule, which lie beyond 4 Å from the CF 2 H carbon, were neglected.In contrast, our M3GNet descriptor had a 6 Å cutoff radius during the initial graph configuration and a 5 Å for three-body interactions during graph convolution, capturing the entire CF 3 group.This suggests that the descriptor adequately accounts for the influence of the terminal trifluoromethyl group.Additionally, the intrinsic ability of GNN-TL descriptors to account for environments beyond their cutoff radius, owing to graph convolution, may have contributed to the substantial improvement in MAE.Notably, the MACE-OFF23-small model, with a cutoff value of 4.5 Å, achieves the highest accuracy, even though it does not capture the fluorine element at a distance of 4.65 Å in the CF 3 group.In summary, the proposed M3GNet and MACE GNN-TL descriptors demonstrate the capability of predicting 13 C NMR chemical shifts for molecules outside the training dataset with an accuracy comparable to that of the state-of-the-art FCHL descriptor.
Lastly, to explore further practical applications of the constructed models, we validated the NMR chemical shielding constants obtained using semi-empirical PM7-level geometries as inputs against the NMR chemical shift values obtained using DFT/GIAO-level structures from the training data.This validation was performed on the QM9 50K holdout set and two drug molecule test sets, as provided by Ref. 29 .The 13 C prediction model employed was the M3GNet/KRR model.The MAE values for each molecule in the drug datasets can be found in Figures 6(b) and 6(d).For the QM9 50K holdout set, the result was 3.61 ppm, showing a significant deterioration of 1.33 ppm compared to when DFT-level geometries were used as inputs.Conversely, predictions for the 40 drugs and 12 drugs test sets showed only minor deteriorations of 0.23 ppm and 0.04 ppm, respectively.These results suggest that even when using more readily available PM7-level geometries as inputs, the transferability of the model remains robust for extrapolative predictions on larger molecules compared to the training data.§ FCHL results are taken from 29 .

Influence of Architectural Choices on GNN-TL Descriptor
Performance In our exploration of different architectures for generating GNN-TL descriptors, we observed several patterns.First, as shown in Table 1 and Table 2, it is important to note that the accuracy of GNN-TL descriptors does not necessarily improve with an increase in the dimensionality of the descriptors.With this in mind, we discuss the architecture of each GNN-based IAP.SchNet, which operates on GNN-based local descriptors to evaluate systems as summations of atomic energies, accounts only for pairwise interactions.This limited inclusion could potentially constrain expressions, leading to inadequate representational power.The subpar performance of MEGNet during transfer learning may be attributed to its architectural design as it integrates atomic (local) descriptors into molecular (global) descriptors through concatenation.This means that the final piece of information passed to the MLP is not extracted directly from the end of the model, which might not be the optimal representation for targeted atomic-wise property prediction; however, it is expected to be suitable for molecule-wise property predictions.Moreover, the M3GNet architecture, which considers three-body interactions, has the potential to capture the three-dimensional structure of molecules with high resolution.Additionally, the MACE model, an E(3) equivariant GNN, has demonstrated high performance as an IAP, suggesting that the outputs of its GNN layers are highly accurate in representing molecular structures.Furthermore, future improvements in accuracy may be achieved by leveraging the outputs of higher-order GNN layers in the MACE model, corresponding to the two-body and three-body terms in the atomic cluster expansion.

Significance of Dataset Size and Diversity
The M3GNet training regimen incorporates data from 187,687 ionic steps spanning 62,783 compounds, including 187,687 energies, 16,875,138 force components, and 1,689,183 stress components.This diverse dataset covers 89 elements from the periodic table.The model is not limited to learning only the energies associated with these elements but extends to atomic-level forces.Moreover, M3GNet training includes not only stable structures but also the processes of structure optimization.The ingestion of vast amounts of data from crystalline systems may have endowed the M3GNet with enhanced expression, potentially making it adept at interpolating molecular systems.The pre-trained MACE-MP0 model was trained using ten times more energy data of crystalline systems, potentially contributing to the improved accuracy of the 13 C NMR chemical shift predictions shown in Table 2. On the other hand, the MACE-OFF23 model, which is specialized for molecules containing 10 elemental species, was trained on a dataset comprising about 1M energy data points, with structures containing up to 150 atoms.This extensive training dataset might make it more suitable for predicting molecular NMR chemical shifts.Thus, the training data for IAPs, much like their architectures, could be a crucial factor in determining the performance of the descriptors.

Potential for Transfer Learning on Quantum Computer
There is a potential for leveraging quantum computation approaches. 111Specifically, our 10 qubit QKRR, facilitated by a simulator, demonstrated a performance comparable to that of stateof-the-art KRR.This is underpinned by the theoretical equivalence of the NPQC with the Gaussian kernel.The quantum kernel method stands out because of its capability to compute with fewer measurement iterations than other quantum computation methodologies, such as quantum neural networks. 112In particular, our proposed M3GNet GNN-TL descriptor can be feasibly realized with a minimum of six qubits, enabling evaluations with a quantum bit count that is more efficient than traditional descriptors, such as SOAP.However, embedding for higher-dimensional SOAP appears to be a challenge, possibly due to noise.From a futuristic perspective, there is excitement about the possibility of developing kernels that traditional computers cannot express, as well as accelerating the inversion calculations of kernel matrices using quantum algorithms.The constant scaling property of our proposed method concerning element number dimensions may significantly contribute to real-time material exploration powered by quantum-classical hybrid algorithms in the near future.

Conclusion
The dynamics of machine learning and its extensive applications across various domains are driving cutting-edge research.Our endeavor to integrate transfer learning with pretrained IAP GNNs for NMR chemical shift prediction offers a paradigm shift in efficiency and scalability.The GNN-TL descriptor presents an unparalleled advantage in terms of scalability due to its consistent dimensionality, irrespective of the number of elements.
Comparative evaluations with other renowned descriptors, such as SOAP, suggest that the GNN-TL descriptor can match, if not surpass, the performance of its contemporaries while maintaining a more compact representation.This is especially important when factoring large datasets, where dimensionality can exponentially burgeon.
Architectural choice plays a pivotal role in the performance of GNN-TL descriptors.Moreover, the diversity and vastness of the training dataset, which encompasses myriad elemental types and structural configurations, augment the robustness and versatility of the GNN.
Our proposed model has immense potential for creating a unified framework capable of predicting various atomic and molecular properties simultaneously, presenting profound implications for accelerated material and molecular research.This potential union of multiple predictions can usher in an era of comprehensive understanding and quicker innovations, possibly revolutionizing fields, such as catalysis, drug discovery, and material design.
The union of transfer learning with pretrained GNNs not only augments prediction accuracy but also drastically reduces learning costs, presenting a cost-effective and efficient alternative to more computationally intensive methods.As we move toward an era in which data-driven insights and models govern the pace of innovation, our research offers a promising pathway for future endeavors in the domain of chemical property predictions with both classical and quantum computers.
Note added -As we were finalizing this manuscript, we became aware of recent articles 86,110,113 that also utilize intermediate information from graph neural network potentials.In Section 3.2, we added a direct comparison between our results and theirs.Elijošius et al. applied the pre-trained MACE descriptor to generative modeling of molecules 86 .

Distribution of Datasets for Each NMR Chemical Shift Prediction Model
The distributions of the training and test sets sampled from the QM9NMR dataset are shown in Fig. 4. Fig. 4(a) shows that above 5K, the distribution is in good agreement with the overall distribution of the 13 C NMR shielding constants.For the other elemental species, the distributions of the training and test sets were in good agreement with the overall distribution.

Kernel Function Dependency for Various GNN-TL Descriptors
The accuracy of KRR models using Gaussian and Laplacian kernels was evaluated.Table 5 presents the mean RMSE and its standard deviation for predictions on the 50K holdout set by models trained on 400 data points of ¹³C using Gaussian and Laplacian kernels.For all models using GNN-TL descriptors, the mean RMSE of models with Gaussian kernel was found to be more accurate than those with Laplacian kernel.However, the variation in accuracy due to dataset sampling (standard deviation) was found to have a greater impact than kernel choice in models with MEG-Net and M3GNet GNN-TL descriptors.On the other hand, in models with MACE GNN-TL descriptors, the impact of kernel choice was more significant than the variation due to dataset sampling.
Next, Table 6 shows the accuracy of KRR models using M3GNet and MACE-OFF23-small GNN-TL descriptors trained on a 100K ¹³C training set.Unlike models trained on the 400 ¹³C training set, the KRR models with M3GNet GNN-TL descriptors consistently showed higher accuracy with the Laplacian kernel compared to the Gaussian kernel.Conversely, the results for MACE-OFF23-small GNN-TL descriptors were similar to those for models trained on the 400 ¹³C training set, with the Gaussian kernel models demonstrating higher accuracy.This suggests that the appropriate kernel function may vary depending on the size of the training data.
Finally, these results indicate the choice of kernel functions for KRR models as presented in the Results section of this paper.For models trained on 400 ¹³C data points, all KRR models  The accuracy of the GNN-TL descriptors was also validated using the molecular structures of two drug molecule data sets reported in Ref 29.The predicted 13 C NMR shielding constants for each drug molecule using the M3GNet and MACE-OFF23 GNN-TL/KRR models are shown in Fig. 6 (a) and (c).These predictions are accompanied by the values predicted by the FCHL/KRR model. 29The prediction results of the M3GNet/KRR model using PM7-level optimized geometries, along with the prediction results using DFT-level geometries, are shown in Fig. 6(b) and (d).
descriptor < l a t e x i t s h a 1 _ b a s e 6 4 = " x g e s a M I d S w 8 B h K l r h z D t + d p Z y 6 6 u N W 4 L Z 9 k 9 d M 1 3 L 9 H U M P u C U c X p N C W n z H 8 7 l u G x b f N l r r v f 3 t D v c D 4 T p f 5 Y H H 9 2 y 9 4Y i 6 M H V J l J b O q B a v S z X 8 p o m F j G r r s m n U w 0 p X E 6 o v G k 2 p d r V 0 l u V Y F J n 3 I B + D L O L Y d N N n U L E P F y b a s M H h Q B K 2 o C O g t o s 8 G D z i 9 h A S 5 x M S 0 T 5 H F 0 n S t i m L U 4 Z O b I v G B q 1 2 Y 9 a h d a 9 m E K l N O s W i 7 p M y g z l 2 x 8 7 Z A 7 t l F + w X e / 5 r r T C q 0 f N y Q L P R 1 3 J P S / 2 Y q T 7 9 V 2 X T L N F 8 U f 3 T s 0 Q d K 5 F X Q d 6 9 i O nd w u z r O 9 9 / P l R X K 3 P h P D t h v 8 n / M b t n N 3 Q D p / N o n p Z 5 5 Q h J + o D 8 2 + d + D 7 a W c v l i r l g u Z E t r 8 V c k M I t P + E z v v Y w S N r C J G p 1 7 i E t c 4 V o Z U R a V g l L s p y o D s W Y a r 0 L 5 8 g e y 2 p Z y < / l a t e x i t > {Z i , R i } < l a t e x i t s h a 1 _ b a s e 6 4 = " X i h 2 K I I s J J W P f f H 5 / B m j 3 o 7 7 P D s = " > A A A C g H i c h V F N L w N B G H 6 6 v u u r u E h c R E O c a l Y E c R I O H F U V i Z V m d 0 3 b S f c r u 9 M m b H p w 9 Q c c n E h E x I X f 4 O I P O P Q n i C O J i 4 O 3 2 0 0 E w T u Z m W e e e Z 9 3 n p k x P E s E k r F G Q m l r 7 + j s 6 u 5 J 9 v b 1 D w y m h o a 3 A 7 f q m z x v u p b r 7 x p 6 w C 3 h 8 L w U 0 u K 7 n s 9 1 2 7 D 4 j l F Z b e 7 v 1 L g f C N f Z k o c e 3 7 f 1

Fig. 1
Fig.1Schematic diagram of our proposed graph neural network transfer learning for predicting chemical properties.The black arrows depict the flow of our transfer learning process.The gray area is a pre-trained IAP (NNP) designed for predicting the energy of the system and composed of a GNN and an MLP.The initial step in our learning procedure involves obtaining the pre-trained GNN block output a set of vectors, {G i }, using the atomic coordinates of a molecule with N atoms, {Z i , R i } as input.Subsequently, we construct a regression model to predict the chemical properties e.g.NMR shielding constants, using this GNN output {G i } as a descriptor.
N, water-17 O (H 2 17 O) for 17 O, and trichlorofluoromethane (CFCl 3 ) for 19 F. We determined the chemical shielding constants for these wellestablished reference substances as follows: 31.7608ppm for 1 H, 187.0521 ppm for 13 C, −147.8164ppm for 15 N, 325.8642 ppm for 17 O, and 171.2621 ppm for 19 F. These constants were evaluated by calculations at the mPW1PW91 105 /6-311+G(2d,p) level using density functional theory (DFT) and gauge-including atomic orbital (GIAO)106 methods.Structure optimization was conducted at the B3LYP 107 /6-31G(2df,p) level in alignment with the methodologies employed for the QM9 NMR dataset.All calculations were performed using the Gaussian 16 software suite.108

Fig. 2
Fig. 2 Log-log plot of the training size (N) and MAE for the 13 C NMR chemical shielding constant prediction model.The red and blue colors represent the results of the KRR with the Laplacian kernel and QKRR with the NPQC kernel using GNN-TL descriptors from the pre-trained M3GNet model, respectively.
N and17 O nuclei, where the chemical shifts can be significantly affected by subtle changes in the molecular structure and environment, as indicated by the MAE values.The 19 F nucleus, while having a higher MAE, showed excellent agreement with the DFT/GIAO calculations, suggesting that the model predictions were robust, even for nuclei with typically higher chemical shift ranges.These J o u r n a l N a me , [ y e a r ] , [ v o l .] , 1-15 | 5

F1Fig. 3
Fig. 3 Predicted NMR chemical shifts for (a) a single molecule, randomly selected from the QM9NMR dataset and not included in the training data, for (b) 1 H, (c) 13 C, (d) 15 N, (e) 17 O, and (f) 19 F. These predictions (represented by red lines) are compared with the calculated values at the DFT/GIAO level, which are considered as the correct values (depicted by blue lines).

Table 3
Predictive performance and data variability of NMR shielding constants for 5 elements

Table 4
The MAE values for the prediction of the 50K QM9NMR hold out set, 40 drug molecules from GDB17 Universe and the other containing 12 drugs with 17 or more heavy atoms.The values in parentheses indicate MaxAE.All units are in ppm.
results demonstrate the strong predictive power and potential of the model as a reliable tool for accurately predicting NMR chemical shifts across a variety of nuclei, even in molecules beyond the scope of the training data.

Table 5
Accuracy (measured by RMSE) of GNN-TL/KRR models trained on 400 13C NMR chemical shift values for different kernel functions.All units are in ppm.

Table 6
The kernel function dependency of accuracy (MAE) for the prediction of the 50K QM9NMR hold out set, 40 drug molecules from GDB17 Universe and the other containing 12 drugs with 17 or more heavy atoms.All units are in ppm.TL descriptors employed the Gaussian kernel.In contrast, for models trained on 100K ¹³C data points, the Laplacian kernel was used for KRR models with M3GNet GNN-TL descriptors, whereas the Gaussian kernel was employed for models with MACE-OFF23-small GNN-TL descriptors.