Attribute-guided prototype network for few-shot molecular property prediction

Abstract The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.


Appendix B. Molecular Fingerprint Details
We selected 14 fingerprints of the following 3 different types.Ring-based fingerprints: Ring-based fingerprints include extended connection fingerprints (ECFPx) and functional class fingerprints (FCFPx), where x represents the bond length or diameter centered on the atom.The x values of ECFP and FCFP are 0,2,4,6 and 2,4,6 respectively.ECFP fingerprints include information such as the number of atomic connections, the number of non-hydrogen chemical bonds, the atomic number, the positive and negative atomic charge, the absolute value of the atomic charge, and the number of connected hydrogen atoms.FCFP mainly contains information on pharmacophores, such as hydrogen bond acceptors, hydrogen bond donors, negatively ionizable, positively ionizable, aromatic atoms, and halogens.Path-based fingerprints: Path-based fingerprints include RDKx (x is 5, 6, or 7), HashTT (topological torsion), and HashAP (atom pairs).HashTT is similar to HashAP and contains information in three dimensions: atomic number, number of φ electron, and number of adjacent atom.Substructure-based fingerprints: Substructure-based fingerprints include MACCS (Molecular ACCess System) and Avalon.MACCS is a molecular fingerprint with a length of 167, and each bit represents a SMARTS-encoded substructure.Avalon fingerprints include features such as atomic symbol path, atom count, augmented symbol path, and augmented atom.

Appendix C. Experimental Details
Specific definitions and sizes of attributes.For the fingerprint attributes, we first extract 14 single fingerprint attributes from 14 fingerprints for all molecules in datasets.Here, we use PCA technology to reduce the dimensions of 14 fingerprints to 100 dimensions to obtain the corresponding 14 single fingerprint attributes with 100 dimensions.Then, we get dual fingerprint attributes and triplet fingerprint attributes by concatenating or summing two or three single fingerprint attributes.So the dimension of dual fingerprint attributes can be 100 or 200; the dimension of triplet fingerprint attributes can be 100 or 300.We use the lowercase form of the fingerprint in Table 1 to represent single fingerprint attributes, such as ecfp2 represents the single fingerprint attribute extracted from ECFP2 fingerprint, and multiple single fingerprint attributes connected by ' ' represent dual fingerprint attributes and triplet fingerprint attributes, such as ecfp0 ecfp2, hashap avalon ecfp4.
For the deep attributes, we automatically extract 7 types of deep fingerprints from 7 self-supervised learning methods mentioned above for all molecules in datasets, and reduce the dimension to 100 dimensions through PCA to obtain deep attributes with 100 dimensions.We use 'CGIP G', 'GraphMVP', 'IEM 3d 10conf', 'MoleBERT', 'molformer', 'unimol 10conf', 'VideoMol 1conf' to represent deep attributes respectively.Finally, we select any one attribute from single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes and deep attributes to guide the training and inferring of the model.
Hyperparameter settings.In all experiments, we use a 5-layer GAT to encode molecules, with a dropout ratio of 0.2.The dimensionality of the molecular representation output by GAT is 100, and the dimensionality of the molecular attributes is 100.The learning rate is selected in [0.0005, 0.001, 0.005, 0.01, 0.05].
Training and inference details.In the training process, GAT is firstly used to extract nodes representation from molecules in support set and query set.Then, the selected molecular attribute (generated directly by the attribute extractor or loaded from a presaved attribute file) and the nodes representation of the molecules are fed into the AGDA module.In AGDA, molecular attributes are used to refine atomic-level and molecular-level representation through local and global attention to make the representation of molecules more informative and discriminative.Finally, the prototypes of positive and negative samples are separately calculated in a weighted manner.The label of the molecule in the query set is determined by calculating the dot product similarity between it and the two prototypes.The cross entropy loss function is used to calculate the loss.The loss of a training task is the average of the sum of the losses of all samples in the query set.The loss of all training tasks are added together and back-propagated to update the model parameters.
During inference, as in the training process, two prototypes are first calculated.To predict the properties of a new molecule, we only need to calculate the dot product similarity between it and the two prototypes.The one with the higher similarity is the label of the molecule.

Appendix D. More Experimental Results
To investigate the impact of different dimensionality reduction methods on performance, we also employ clustering to reduce the dimensionality of molecular fingerprints on the tox21 dataset and adjusted the learning rate to obtain the experimental results shown in Table S2 below.It can be observed that the performance is not as good as that achieved with PCA.We believe this is because the information obtained after dimensionality reduction through PCA is richer.
We present the experimental results of APN with single fingerprint attributes on the Tox21, SIDER, and MUV datasets in Table S3.The results without any guidance of fingerprint attributes are marked as none.
To investigate whether the combination of multiple fingerprint information can improve the performance of few-shot molecular property prediction, we combine two different molecular fingerprint information as the attributes of the molecules.We consider two ways of combining fingerprint information: summation or concatenation, and the experimental results of these two combination methods on the Tox21 dataset are shown in Table S4 and Table S5.GNN mainly includes edge feature updates and message passing operations, with a time and space complexity of O((E + N)), where E and N represent the number of edges and nodes respectively.Assuming that the number of layers of GNN is L, the time and space complexity of GNN are both O(L * (E + N)).
The attribute extractor extracts all molecular attributes in a data set at one time, which can be generated directly from pregenerated attributes or directly through the attribute extractor.If it is directly generated through the attribute extractor, the time and space complexity are O(D * F * T ) and O(D * F) respectively, where T represents the time required by rdkit to extract a molecular fingerprint, D is the size of the data set and F is the dimension of the attributes.
The dimensions of grpah embedding and attributes are both 100, GNN has 5 layers, and M is a constant, so the time and space complexity of APN is O(N + E), where N is the number of nodes and E is the number of edges.O(N + E) means that the space and time consumption of APN is a linear function of the number of nodes and the number of edges, which is usually the ideal situation for processing graph structures, demonstrating that APN is a very efficient method.

Figure S1 :
Figure S1: The ROC-AUC results with different learning rates and attributes dimensions on Tox21, SIDER and MUV datasets.

Table S1 :
The features of atoms and bonds.

Table S2 :
The AUC result on 10-shot tasks from Tox21 of dimensionality reduction through clustering.

Table S3 :
The ROC-AUC score of APN with single fingerprint attributes on 2-way 10-shot tasks.

Table S4 :
The ROC-AUC score (%) of APN when concatenating two single fingerprint attributes on 2-way 10-shot tasks from Tox21 dataset.

Table S5 :
The ROC-AUC score (%) of APN when summing two single fingerprint attributes on 2-way 10-shot tasks from Tox21 dataset.