3DMolMS: prediction of tandem mass spectra from 3D molecular conformations

Abstract Motivation Tandem mass spectrometry is an essential technology for characterizing chemical compounds at high sensitivity and throughput, and is commonly adopted in many fields. However, computational methods for automated compound identification from their MS/MS spectra are still limited, especially for novel compounds that have not been previously characterized. In recent years, in silico methods were proposed to predict the MS/MS spectra of compounds, which can then be used to expand the reference spectral libraries for compound identification. However, these methods did not consider the compounds’ 3D conformations, and thus neglected critical structural information. Results We present the 3D Molecular Network for Mass Spectra Prediction (3DMolMS), a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. We evaluated the model on the experimental spectra collected in several spectral libraries. The results showed that 3DMolMS predicted the spectra with the average cosine similarity of 0.691 and 0.478 with the experimental MS/MS spectra acquired in positive and negative ion modes, respectively. Furthermore, 3DMolMS model can be generalized to the prediction of MS/MS spectra acquired by different labs on different instruments through minor fine-tuning on a small set of spectra. Finally, we demonstrate that the molecular representation learned by 3DMolMS from MS/MS spectra prediction can be adapted to enhance the prediction of chemical properties such as the elution time in the liquid chromatography and the collisional cross section measured by ion mobility spectrometry, both of which are often used to improve compound identification. Availability and implementation The codes of 3DMolMS are available at https://github.com/JosieHong/3DMolMS and the web service is at https://spectrumprediction.gnps2.org.


S1 Satisfaction of invariance principles
There are two invariance principles (Qi et al., 2017) that the point-based DNN should follow: the permutation invariance, and the SE(3) invariance, which guarantees the prediction of the DNN remains the same (invariant) when the x, y, z-coordinates in the input points are rotated and/or translated. The permutation invariance is satisfied by the main idea of point set representation because the elemental operation is symmetric on all the points. Below, we show the elemental operation 3DMolConv satisfies the SE(3) invariance. Let's restate the equations used in 3DMolConv here.
In each layer l, we define the 3DMolConv as: The distance between two points x i and x j is computed as: The angle between the point vectors x i and x j encodes the information related to either the bond angle or the non-bond angles of the edge ⟨x i , x j ⟩: To consider the representation of the whole compound, we express the representation matrix of all edges according to Eq. 3: where M is a mask for k-nearest neighbors.
Theorem 1. The edge matrix is rotation-invariant, i.e., for ∀R ∈ SO(3), ∀X ∈ R N ×F (N ∈ N + , F ∈ N + ), i.e., it satisfies E(X) = E(XR) Proof. For ∀R ∈ SO(3), ∀X ∈ R N ×F (N ∈ N + , F ∈ N + ), according to the definition of edge matrix in Eq.4, Similarly, we can prove that ϕ(x i , x j ) and d(x i , x j ) are both rotation-invariant. Note that the x, y, z-coordinates are not directly encoded after the first layer; therefore, the representations in every follow-up layer (Eq. 1) are SO (3), and the whole model is SO(3). In addition, since we always move the gravity center of x, y, z-coordinates in the input point set to the origin through a preprocessing step, the representations learned in the 3DMolMS model are always translation invariant. Hence, the molecular representation in 3DMolMS is SE(3) invariant.

S2 Implementation and Training
We implemented 3DMolNet using PyTorch (Paszke et al., 2017(Paszke et al., , 2019. The entire 3DMolMS model contains a total of 91, 658, 760 parameters. The model is trained by Adam (Kingma and Ba, 2014) optimizer with a learning rate at 0.001 and batch size at 64. The learning rate is reduced by half when the loss does not go down in 5 epochs (implemented by ReduceLROnPlateau in PyTorch). We use the early stopping strategy to avoid overfitting. Specifically, the training stopped when the loss does not go down in 10 epochs. All the training and test codes of 3DMolMS are released on GitHub at https://github.com/JosieHong/3DMolMS, and it can also be accessed through an online service at https://spectrumprediction.gnps2.org.
The pre-training on multiple ion modes of Agilent QTOF spectra takes about 8 hours, and all the finetuning (i.e. finetuning on positive/negative ion mode of Agilent QTOF spectra, and finetuning on positive/negative ion mode of other QTOF spectra) takes less than 1 hour on one NVIDIA GTX 1080ti GPU. On average, the conformation generation and prediction takes about 0.22 and 0.13 seconds per compound on average.  The feed-forward layers of PointNet (Qi et al., 2017) are reorganized as 64, 64, 128, 256, 512, 1024, which is an advanced structure proposed in DGCNN (Wang et al., 2019). The decoder for mass spectra prediction (see Fig. S1) is equipped.

S3.2 Mass Spectra Prediction Implemented from SchNet
We kept all the encoding settings of SchNet (Schütt et al., 2018), including the three intersection block with the hidden dimension of 64, the atom-wise layers with the hidden dimension of 32, and the shifted sofpplus. Our implementation of SchNet is referred to https://github.com/ atomistic-machine-learning/schnetpack. Also, the decoder for mass spectra prediction (see Fig. S1) is equipped.

S3.3 NEIMS and MassFormer
We adjusted the settings of NEIMS (Wei et al., 2019) and MassFormer (Young et al., 2021) as shown in Table S2, which retain most of the original settings but increase the number of the feed-forward layers from 3 to 6 enlarging the capacity of the models. There are two types of feed-forward layers: 'neims' means the same linear block proposed in NEIMS (Wei et al., 2019); 'standard' means one linear layer with batch normalization and dropout.
The training process is changed to pretraining on multiple precursor types and fine-tuning on a certain precursor type, which is the same as the training process of other predictors. All the implementations of NEIMS and MassFormer are available at https://github.com/Roestlab/massformer.      Figure S3: The accuracy of predicted spectra is correlated with the max Bulk Tanimoto similarity (Tanimoto, 1958) between one test molecule and the training molecules. The orange dots are the Kernel Density Estimator (KDE) showing the estimated continuous probability density curve.

S5
S4.4 Figure S4 (a) The training set using MaxMin Pick (Ashton et al., 2002) has lower cosine similarities (i.e. higher diversity) than the training set using random picking.
(b) 3DMolMS trained in diversity training samples (i.e. training samples picked by MaxMin Pick) achieves higher predicted accuracy than using less diversity training samples. Figure S4: Experiments of using different diversity training samples. S6