Machine Learning Full NMR Chemical Shift Tensors of Silicon Oxides with Equivariant Graph Neural Networks

The nuclear magnetic resonance (NMR) chemical shift tensor is a highly sensitive probe of the electronic structure of an atom and furthermore its local structure. Recently, machine learning has been applied to NMR in the prediction of isotropic chemical shifts from a structure. Current machine learning models, however, often ignore the full chemical shift tensor for the easier-to-predict isotropic chemical shift, effectively ignoring a multitude of structural information available in the NMR chemical shift tensor. Here we use an equivariant graph neural network (GNN) to predict full 29Si chemical shift tensors in silicate materials. The equivariant GNN model predicts full tensors to a mean absolute error of 1.05 ppm and is able to accurately determine the magnitude, anisotropy, and tensor orientation in a diverse set of silicon oxide local structures. When compared with other models, the equivariant GNN model outperforms the state-of-the-art machine learning models by 53%. The equivariant GNN model also outperforms historic analytical models by 57% for isotropic chemical shift and 91% for anisotropy. The software is available as a simple-to-use open-source repository, allowing similar models to be created and trained with ease.


DimeNet++ Implementation Theory
The implementation for the DimeNet++ model follows the implementation of the original model proposed by Klicpera,Groß,and Günnemann S1 with slight modification to batching to allow for nodal prediction.
The main insight from the DimeNet model is the use of angular representations between atom pair kj and ji, as well as interatomic distance between atoms k and j. Message embeddings in the DimeNet architecture take inspiration from the particle in a spherical well problem, in which the wave function used to model the system takes the form where j l and y l are spherical Bessel functions of the first and second kind respectively, and Y l m are the spherical harmonics. To constrain Eq.
(1) to regular solutions b lm is set to 0, and furthermore, to construct a 2-dimensional basis, m is set to 0. Boundary conditions under such constraints are obtained by setting k = z ln rcut where z ln is the n-th root of the l-order Bessel function. Solution yields the 2-Dimensional Fourier-Bessel basis, a SBF,ln (d, α) = 2 r 3 cut j 2 l+1 (z ln ) Messages are then generated from the concatenation of the node embeddings of the atom pair and their radial embedding, where || denotes the concatenation, σ denotes the sigmoid activation function, and weight matrix W and bias vector b are learnable. From the messages, the node embedding is obtained from the sum of all messages between atom pairs within the cutoff radius, h i = j∈N i m kj .
In the original implementation, at the time of property read-out the node embeddings are passed through multiple dense layers to generate an atom-wise output, t

Eigenn Implementation Theory
The implementation of the Eigenn model model follows the TFN framework laid forward by Thomas and Smidt et al. S2 to impose rotational equivariance on the model. TFNs use spherical harmonics, Y l m and a learned radial function, R c as filters, F cm . The filter used is restricted such that where l i and l f correspond to the spherical harmonic order of the input and output filter respectively, ⃗ r is the vector orientation with respect to the atom center, and r is the magnitude of vectorr.
The filter representation is then stored in a dictionary, denoted V l (acm) , where l denotes the dictionary key and acts to store the filter elements by their spherical harmonic order. a is the point index, i.e. the index of the atom of interest; c is the channel index, i.e. indicies for objects that transform similarly; and m is the representation index, i.e. the starting index for each object.
In order to build a network, each layer must combine the layer input with the filter embedding. The convolution layer may be constructed from the individual filters, F , and the layer input, V , which are then combined with the Clebsch-Gordan coefficients, C, to give a layer where, ⃗ r ab is the vector between atoms a and b, and i, f, o indicate input, filter, and output respectively.
To implement the equivariant GNN, we created a network with 4 message passing layers.
The radial network was composed of 2 invariant layers consisting of 64 invariant neurons.
The convolutional layer was created using an irrep shape of 32x0o + 32x0e + 16x1o +
To create the embeddings, a species embedding vector of size 16 was used along with 12 radial basis functions and edge irreps of the shape 0e + 1o + 2e. The cutoff neighborhood was set to 5.0Å.
During training, a batch size of 16 is used with the Adam optimizer and a learning rate 0f

Failures of an Tensor Conventions
Its well known that tensor conventions in NMR pose somewhat of an issue. It is typically recommended to use the Haeberlen or Maryland convention to report a tensor, but these conventions are often defined in an algebraically dangerous manner and may have features which make machine learning difficult. The two Haeberlen conventions have a discontinuity at ζ = ∆σ = 0 splits the space in two and crossing the discontinuity corresponds to an axis flip, which is discussed further below. Additionally, η = 1 creates a degeneracy making learning near these points difficult. The Ohio XY convention is an attempt to alleviate some of the issues of the Haeberlen convention by compressing the discontinuity to a single point, however, by taking the magnitude of ζ we are no longer able to convert back to ζ, and is no longer suitable for machine learning. The Maryland convention also has a discontinuity at Ω = 0, however, this discontinuity is at the edge of the data rather than the middle of the data, resulting in a somewhat easier to learn space. The AxRh convention (and similarly the eigenvalues themselves) improves on the others in that there is no discontinuity, however, every convention derived from the Cartesian tensor suffers from an axis switching problem. There is an additional feature in violate the assumption that one may inter-convert between the different conventions. This issue arises when converting between conventions using two different axes orderings.
The cause of this issue is two-fold. Due to the axis switching in the Haeberlen convention as one moves from negative to positive values of ζ, writing out the Haeberlen (ζη) definition in terms of the standard convention yields two different equations, one in which ζ > 0: and one in which ζ < 0: The second issue is with respect to the errors of the model prediction and the error propagation through the transformation from one axis ordering scheme to another. Using the error propagation equation where δ Y is the error in parameter Y which is a function, f , of σ 11 , σ 22 and σ 33 . Illustrating the case for the Haeberlen convention, substituting Eq. (8) and Eq. (9) into Eq. (10) yields and δ 2 ζ<0 = 1 9 δ 2 σ 11 + 1 9 δ 2 σ 22 + 4 9 δ 2 σ 33 .
From inspection of Eq. (11) and Eq. (12) one can see that σ 11 and σ 33 have different effects S10 on the ζ error for the different values of ζ. This case arises in ML as we have not put any constraints on the loss function to guide σ 11 and σ 33 to have the same error.
While the Haeberlen-based models tend to have poor performance metrics compared to other models, the training in the Haeberlen convention did have one major benefit. The difficult manifold learned for the Haeberlen conventions forced the models to learn a mapping between structure and tensor such that the error on the σ 11 and σ 33 eigenvalues were approximately equal to each other. The approximately equal errors for the two components, however, came at the cost of a significantly higher error in the eigenvalues when compared to the conventions that do not have such a difficult topology such as the Maryland and standard convention. While it may be beneficial to have additional constraints on the loss or force the model to learn a better representation through a difficult topology, a more elegant solution may be to impose additional symmetry constraints on the model its-self to predict a full tensor rather than three independent scalars.
We also find that the NMR tensor parameters may be unsuitable as metrics to rank performance for ML models. When assessing the utility of a model for experimental use the MAE of the full tensor may not be a helpful metric to visualize whereas comparing the Haberlen ζη or Maryland Ωκ parameters may be more attractive options as they are what is ultimately used by spectroscopists.
The most striking feature shown in Fig. S1 is the Q 4 and Q 0 anisotropy which shows a y = −x correlation for some points close to ζ = 0 which have been similarly reported. S3 Analysis of these points that exhibit this behavior often shows nearly perfect predictions of the tensor and tensor eigenvalues.  Similarly, the Q 4 and Q 0 η (and κ) parameters show significant error. This parameter is ill-suited to describe such spherically symmetric tensors and errors may be high for the smallest numerical inaccuracies. Furthermore, η has little influence on spectral fitting for these sites as the parameter describes the skewness of the line shape and for Q 4 and Q 0 sites, which exhibit tight line shapes, the differences in the skewness are difficult to detect.
Overall, the isotropic shielding along with XY are more robust at describing the tensor and should be used as metrics rather than the normal tensor conventions. S13