Geometry-complete perceptron networks for 3D molecular graphs

Abstract Motivation The field of geometric deep learning has recently had a profound impact on several scientific domains such as protein structure prediction and design, leading to methodological advancements within and outside of the realm of traditional machine learning. Within this spirit, in this work, we introduce GCPNet, a new chirality-aware SE(3)-equivariant graph neural network designed for representation learning of 3D biomolecular graphs. We show that GCPNet, unlike previous representation learning methods for 3D biomolecules, is widely applicable to a variety of invariant or equivariant node-level, edge-level, and graph-level tasks on biomolecular structures while being able to (1) learn important chiral properties of 3D molecules and (2) detect external force fields. Results Across four distinct molecular-geometric tasks, we demonstrate that GCPNet’s predictions (1) for protein–ligand binding affinity achieve a statistically significant correlation of 0.608, more than 5%, greater than current state-of-the-art methods; (2) for protein structure ranking achieve statistically significant target-local and dataset-global correlations of 0.616 and 0.871, respectively; (3) for Newtownian many-body systems modeling achieve a task-averaged mean squared error less than 0.01, more than 15% better than current methods; and (4) for molecular chirality recognition achieve a state-of-the-art prediction accuracy of 98.7%, better than any other machine learning method to date. Availability and implementation The source code, data, and instructions to train new models or reproduce our results are freely available at https://github.com/BioinfoMachineLearning/GCPNet.


Introduction
Over the last several years, the field of deep learning has pioneered many new methods designed to process graphstructured inputs. Being a ubiquitous form of information, graph-structured data arises from numerous sources such as the fields of physics and chemistry, as shown in Figure  1. Moreover, the relational nature of graph-structured data allows one to identify and characterize topological associ-  ations between entities in large real-world networks (e.g., social networks).
In particular, 3D data often emerges in domains such as computer vision and can be readily described as graphstructured inputs (Valsesia et al., 2018). Studies such as those of (Qi et al., 2017b), (Zhang &Rabbat, 2018), and(Zhou et al., 2021) have demonstrated the utility of this approach to modeling 3D data as graphs. Additionally, to process and analyze such 3D information in a meaningful, powerful, and concise way, one must also carefully consider the symmetries present in such data to reduce the geometric redundancies they might present to a machine learning model (Esteves, 2020).
Many disciplines present types of data for which 3D geometric information can be carefully analyzed to produce meaningful and reliable predictions about the system at hand. For example, in protein biology, knowing the 3D structure of a protein macromolecule is a key step towards developing a deeper understanding of its molecular function in living organisms (Hegyi & Gerstein, 1999). In a context-specific manner, similar geometric insights have been proposed in fields such as neurobiology (Umulis & Othmer, 2012) and materials design (Matter & Niederberger, 2022). In light of such insights, the field of deep learning has grown to account for the importance of geometry in the representation learning of real-world objects. After discussing relevant related works, in the remainder of the paper, we introduce a new type of neural network to account for the symmetries present in a variety of 3D molecule systems -the group of rotations and translations in 3D space -and systematically study this network's behavior through a collection of experimental results for different 3D modeling tasks, verifying the effectiveness of our proposed method.

Related Work
Geometric Machine Learning. In the case of being presented with geometric (non-Euclidean domain) data, machine learning systems have been developed to process arbitrarily-structured inputs such as graphs and meshes (Hamilton, 2020;Cao et al., 2022). A subfield of geometric machine learning, geometric deep learning, has recently received much attention from researchers for its various success cases in using deep neural networks to faithfully model geometric data in the form of 3D graphs and manifolds (Masci et al., 2016;Bronstein et al., 2017).
Scientific Graph Representation Learning. In scientific domains such as computational biology and chemistry, graphs are often used to represent the 3D structures of molecules (Duvenaud et al., 2015;Liu et al., 2021), chemical compounds (Akutsu & Nagamochi, 2013), and even large biomolecules such as proteins (Xia & Ku, 2021;Morehead et al., 2022b). Graphs have even been used in fields such as computational physics to model complex particle physics simulations (Shlomi et al., 2020) as well as in realworld traffic systems to predict travel times and delays (Derrow-Pinion et al., 2021). Underlying many of these successful examples of graph representations are GNNs, a class of machine learning algorithms specialized in processing irregularly-structured input data such as graphs. Careful applications of graph neural networks in scientific domains have considered the physical symmetries present in many scientific data such as molecular state symmetries (Ye et al., 2020) or physical dynamics constraints (Han et al., 2022) and have leveraged such symmetries to design new attentionbased neural network architectures (Morehead et al., 2022a;Jumper et al., 2021).
Equivariant Neural Networks. Throughout their development, geometric deep learning methods have expanded to incorporate within them equivariance to various geometric symmetry groups to enhance their generalization capabilities and adversarial robustness. Methods such as groupequivariant CNNs (Cohen & Welling, 2016), Tensor Field Networks (Thomas et al., 2018, SE(3)-Transformers (Fuchs et al., 2020), andequivariant GNNs (Fuchs et al., 2020;Jing et al., 2020;2021;Kofinas et al., 2021;Du et al., 2022;Aykent & Xia, 2022) have paved the way for the development of future deep learning models that respect physical symmetries present in 3D data (e.g., rotation equivariance with respect to input data symmetries). Concurrently to these efforts, self-supervised learning methods have begun to facilitate automatic detection and enforcement of the symmetries present in input data within the network's representations for such inputs (Dangovski et al., 2021). Nonetheless, deciding how to optimize self-supervised learning algorithms for one's desired level of equivariance has proven to be a challenging task (Xie et al., 2022).
Contributions. In this work, we make connections between geometric graph neural networks, equivariance, and geometry information completeness guarantees that provide one with a rich foundation on which to build new graph neural network architectures. In particular, we introduce a new graph neural network model that is equivariant to the group of 3D rotations and translations (i.e., SE(3), the special Euclidean group) and guarantees directional information completeness following graph message-passing on 3D point clouds. We showcase its expressiveness and flexibility for modeling physical systems through several benchmark studies. In detail, we provide the following contributions.
• We present the first geometric graph neural network architecture with directional information completeness guarantees that in an SE(3)-equivariant manner can predict new node positions as well as scalar and vectorvalued features for nodes and edges.
• We establish new state-of-the-art results for three separate molecular-geometric representation learning tasks where model predictions vary from analyzing individual nodes to summarizing entire graph inputs.
• Our experiments demonstrate that the geometric information that rich geometric message-passing procedures and local equivariant frame encodings of node positions provide is useful for predicting both vector-valued node features as well as scalar node and graph-level properties across different geometric datasets.
Figure 2: A framework overview for our proposed Geometry-Complete Perceptron Network (GCPNET). Our framework consists of (i.) a graph (topology) definition process, (ii.) a GCPNET-based graph neural network for 3D molecular representation learning, and (iii.) demonstrated application areas for GCPNET. Zoom in for the best viewing experience.

Overview of the Problem Setting
We represent a 3D molecular structure as a 3D k-nearest neighbors (k-NN) graph G = (V, E) with X ∈ R N ×3 as the respective Cartesian coordinates for each node, where N = |V| and E = |E|. We then design E(3)-invariant (i.e., 3D rotation, reflection, and translation-invariant) node features H ∈ R N ×h and edge features E ∈ R E×e as well as O(3)equivariant (3D rotation and reflection-equivariant) node features χ ∈ R N ×(m×3) and edge features ξ ∈ R E×(x×3) , respectively.
Upon constructing such features, we apply several layers of graph message-passing using functions Φ that update node and edge features using invariant and equivariant representations for the corresponding feature types. Importantly, our method for doing so guarantees, by design, SE(3) equivariance with respect to its vector-valued input coordinates and features (i.e., x i ∈ X, χ i ∈ χ, and ξ ij ∈ ξ) and SE(3)-invariance regarding its scalar features (i.e., h i ∈ H and e ij ∈ E) to achieve geometric self-consistency of the 3D structure of the input molecular graph G during graph message-passing. We formalize the equivariance, geometric self-consistency, and geometric completeness constraints using the three following definitions. Definition 3.1. (SE(3) Equivariance).

SE(3)-equivariant complete representations
Representation learning on 3D molecular structures is a challenging task for a variety of reasons: (1) an expressive representation learning model should be able to predict arbitrary vector-valued quantities for each atom and atom pair in the molecular structure (e.g., using χ and ξ to predict side-chain atom positions and atom-atom displacements for each residue in a 3D protein graph); (2) arbitrary rotations or translations to a 3D molecular structure should affect only the vector-valued representations a model assigns to a molecular graph's nodes or edges, whereas such 3D transformations of the molecular structure should not affect the model's scalar representations for nodes and edges (Du et al., 2022); (3) the geometrically invariant properties of a molecule's 3D structure should be uniquely identifiable by a model; and (4) in a geometry-complete manner, scalar and vector-valued representations should mutually exchange information between nodes and edges during a model's forward pass for a 3D input graph, as these information types can be correlatively related (e.g., a scalar feature such as the L 2 norm of a vector v can be associated with the vector of origin v) (Aykent & Xia, 2022; Morehead et al., 2022a).
In line with this reasoning, we need to ensure that the coordinates our model predicts for the node positions in a molecular graph G transform according to SE(3) transformations of the input positions, in contrast to our current approaches that remain E(3)-equivariant or E(3)-invariant to 3D positional transformations of G and consequently introduce insufficient geometric priors into the model's learning procedure (e.g., due to the chirality of 3D protein structures). Simultaneously, without introducing any direction degeneration between pairs of node positions, the model should SE(3)-invariantly and SE(3)-equivariantly update the scalar and vector-valued features of G, respectively. To increase its generalization capabilities, our model should also maintain SE(3)-invariance of its scalar features produced when the input graph is transformed in 3D space. Following (Wang et al., 2022a), this helps prevent the model from losing important geometric information (i.e., attaining geometric self-consistency) during graph message-passing. One way to do this is to introduce a new type of message-passing neural network.

Methodology
Towards this end, we introduce our architecture for Φ satisfying Defs. (3.1), (3.2), and (3.3) which we refer to as the Geometry-Complete SE(3)-Equivariant Perceptron Network (GCPNET). We illustrate the GCPNET algorithm in Figure 2 and outline it in Algorithm 1. Subsequently, by providing accompanying proofs, we expand on our definition for GCP and GCPConv in Appendices 4.1 and 4.2.1, respectively, while further illustrating GCP in Figure 3.
It is then straightforward to prove the following three propositions (see Appendices A.1 through A.3 for a more detailed description of the GCPNET algorithm and its equivariant properties).

Geometry-Complete Perceptron Module
As illustrated in Figure 3, GCPNET represents the features for nodes and edges within an input graph as a tuple (s, V ) to distinguish scalar features (s) from vectorvalued features (V ). We then define GCP Fij ,λ (·) to represent the GCP encoding process, where λ represents a downscaling hyperparameter (e.g., 3) and F ij ∈ R 3×3 denotes the SO(3)-equivariant (i.e., 3D rotation-equivariant) frames constructed using the Localize operation (i.e., the EquiFrame operation of (Du et al., 2022)) in Algorithm 1. Specifically, the frame encodings are defined as and c t ij = a t ij × b t ij , respectively. In Appendix A.3, we discuss how these frame encodings are direction informationcomplete for edges, allowing networks incorporating them to effectively detect and leverage for downstream tasks the inter-atomic and force field interactions present within realworld many-body systems such as small molecules and proteins.
Expressing Vector Representations with V . The GCP module then expresses vector representations V as follows. The features V with representation depth r are downscaled by λ.
Additionally, V is separately downscaled in preparation to be subsequently embedded as direction-sensitive edge scalar features.
Deriving Scalar Representations s . To update scalar representations, the GCP module, in the following manner, derives two invariant sources of information from V and combines them with s: where · denotes the inner product, N (·) represents the neighbors of a node, and · 2 denotes the L 2 norm. Then, denote t as the representation depth of s, and let s (s,q,z) ∈ R t+9+(r/λ) with representation depth (t + 9 + (r/λ)) be projected to s with representation depth t : s v = {s (s,q,z) w s + b s |w s ∈ R (t+9+(r/λ))×t } (6) Figure 3: An overview of our proposed Geometry-Complete Perceptron (GCP) module. The GCP module introduces node and edge-centric encodings of 3D frames as input features that are used to update both scalar and vectorvalued features with geometry and direction informationcompleteness guarantees.
Deriving Vector Representations V . The GCP module concludes by updating vector representations as follows: where represents element-wise multiplication and the gating function σ g is applied row-wise to preserve SO(3) equivariance within V .
Conceptually, the GCP module is autoregressively applied to tuples (s, V ) a total of ω times to derive rich scalar and vector-valued features. The module does so by blending both feature types iteratively with the 3D direction and information completeness guarantees provided by geometric frame encodings F ij .

Learning from 3D Graphs with GCPNET
In this section, we propose a flexible manner in which to perform 3D graph convolution with our proposed GCP module, as illustrated in Figure 2 and employed in Algorithm 1.

GEOMETRY-COMPLETE GRAPH CONVOLUTION.
Let N (i) denote the neighbors of node n i , selected using a distance-based metric such as k-nearest neighbors or a radial distance cutoff. Subsequently, we define a single layer l of geometry-complete graph convolution as where n l i = (h l i , χ l i ); e ij = (e 0 ij , ξ 0 ij ); φ is a trainable function denoted as GCPConv; l signifies the representation depth of the network; A is a permutation-invariant aggregation function; and Ω ω represents a message-passing function corresponding to the ω-th GCP message-passing layer. We proceed to expand on the operations of each graph convolution layer as follows.
To start, messages between source nodes i and neighboring nodes j are first constructed as where ∪ denotes a concatenation operation. Then, up to the ω-th iteration, each message is updated by the m-th message update layer using residual connections as where we empirically find such residual connections between message representations to reduce oversmoothing within GCPNET by mitigating the problem of vanishing gradients.
Updated node featuresn l are then derived residually using an aggregation of generated messages aŝ where f represents an aggregation function such as a summation or mean that is invariant to permutations of node ordering. The residual connection betweenn l and n l is established here to encourage the network to update the representation space of node features in a layer-asynchronous manner.
To encourage GCPNET to make its node feature representations independent of the size of each input graph, we subsequently employ a node-centric feed-forward network to update node representations. Specifically, we apply ton l a linear GCP function with shared weights φ f followed by r ResGCP modules, operations concisely portrayed as Lastly, if one desires to update the positions of each node in G, we propose a simple, SE(3)-equivariant method to do so using a dedicated GCP module as follows:

The GCPNET Algorithm
Geometry-Complete Perceptron Networks for 3D Molecular Graphs 10: Finalize (X L ) ← Decentralize(X l ) 11: else 12: In this section, we describe our overall learning algorithm driven by GCPNET (Algorithm 1). We also discuss the rationale behind our design decisions for GCPNET and provide examples of use cases in which one might apply GCPNET for specific learning tasks.
On Line 2 of Algorithm 1, the Centralize operation removes the center of mass from each node position in the input graph to ensure that such positions are subsequently 3D translation-invariant.
Thereafter, following (Du et al., 2022), the Localize operation on Line 3 crafts translation-invariant and SO(3)equivariant frame encodings F t ij = (a t ij , b t ij , c t ij ). As described in more detail in Appendix A.3 and by (Du et al., 2022), these frame encodings are direction informationcomplete for edges, imbuing networks that incorporate them with the ability to more easily detect force field interactions present in many real-world atomic systems, as we demonstrate through corresponding experiments in Section 5.
Before applying any geometry-complete graph convolution layers, on Line 4 we use GCP e to embed our input node and edge features into scalar and vector-valued values, respectively, while incorporating geometric frame information. Subsequently, in Lines 5-6, each layer of geometrycomplete graph convolution is performed autoregressively via GCPConv l starting from these initial node and edge feature embeddings, all while maintaining information flow originating from the geometric frames F ij .
On Lines 8 through 12, we finalize our procedure by which to update in an SE(3)-equivariant manner the position of each node in an input 3D graph. In particular, we update node positions by residually adding learned vector-valued node features (χ l vi ) to the node positions produced by the previous GCPConv layer (l − 1). As shown in Appendix A.1, such updates are initially SO(3)-equivariant, and on Line 10 we ensure these updates also become 3D translationequivariant by adding back to each node position the input graph's original center of mass via the Decentralize operation. In total, this procedure produces SE(3)-equivariant updates to node positions. Additionally, for models that update node positions, we note that Line 9 updates frame encodings F ij using the model's final predictions for node positions to provide more information-rich feature projections on Line 14 via GCP p to conclude the forward pass of GCPNET.

NETWORK UTILITIES.
In summary, GCPNET receives an input 3D graph G with node positions x, scalar node and edge features, h and e, as well as vector-valued node and edge features, χ and ξ. The model is then capable of e.g., (1) predicting scalar node, edge, or graph-level properties while maintaining SE(3) invariance; (2) estimating vector-valued node, edge, or graphlevel properties while ensuring SE(3) equivariance; or (3) updating node positions in an SE(3)-equivariant manner.

Experiments
In this work, we consider three distinct modeling tasks comprised of six datasets in total, where implementation details are discussed in Appendix A.5. We note that additional experiments are included in Appendix A.4 for interested readers.
Ligand Binding Affinity, Graph Regression. Proteinligand binding affinity (LBA) prediction challenges methods to estimate the binding affinity of a protein-ligand complex as a single scalar value (Townshend et al., 2020). Accurately estimating such values in a matter of seconds using a machine learning model can provide invaluable and timely information in the typical drug discovery pipeline (Rezaei et al., 2020). The corresponding dataset for this SE(3)-invariant task is derived from the ATOM3D dataset (Townshend et al., 2020) and is comprised of 4,463 nonredundant protein-ligand complexes, where cross-validation splits are derived using a strict 30% sequence identity cutoff. Results are reported in terms of the root mean squared error (RMSE), Pearson's correlation (p), and Spearman's correlation (Sp) between a method's predictions on the test dataset and the corresponding ground-truth binding affinity values represented as pK = − log 10 (K), where K is the binding affinity measured in Molar units. Baseline comparison methods for this task include a variety of state-of-the-art equivariant neural networks (ENNs), CNNs, and GNNs. Protein Structure Ranking, Graph Regression. Protein structure ranking (PSR) requires methods to predict the overall quality of a 3D protein structure when comparing it to a reference (i.e., native) protein structure (Townshend et al., 2020). The quality of a protein structure is reported as a single scalar value representing a method's predicted global distance test (GDT TS) score (Zemla, 2003) between the provided decoy structure and the native structure. Such information is crucial in drug discovery efforts when one is tasked with designing a drug (e.g., ligand) that should bind to a particular protein target, notably when such targets have not yet had their 3D structures experimentally determined and have rather had them predicted computationally using methods such as AlphaFold 2 (Jumper et al., 2021). The respective dataset for this SE(3)-invariant task is also derived from the ATOM3D dataset (Townshend et al., 2020) and is comprised of 40,950 decoy structures corresponding to 649 total targets, where cross-validation splits are created according to a target's release year in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition (Kryshtafovych et al., 2021). Results are reported in terms of the Pearson's correlation (p), Spearman's correlation (Sp), and Kendall's tau correlation (K) between a method's predictions on the test dataset and the corresponding ground-truth GDT TS values, where local results are averaged across predictions for individual targets and global results are averaged directly across all targets. Baseline comparison methods for this task include a composition of state-of-the-art ENNs, CNNs, and GNNs, as well as previous statistics-based methods.
Newtonian Many-Body Systems, Node Regression. Newtonian many-body systems modeling (NMS) asks methods to forecast the future positions of particles in many-body systems of various sizes (Du et al., 2022), bridging the gap Table 2: Comparison of GCPNET with baseline methods for the PSR task. Local metrics are averaged across targetaggregated metrics. The best results for this task are in bold, and the second-best results are underlined. N/A denotes a metric that could not be computed. between the domains of machine learning and physics. In our experimental results for the NMS task, the four systems (i.e., datasets) on which we evaluate each method are comprised of increasingly more nodes and are influenced by force fields of increasingly complex directional origins for which to model, namely electrostatic force fields for 5-body (ES (5)) and 20-body (ES (20)) systems as well as for 20-body systems under the influence of an additional gravity field (G+ES (20)) and Lorentz-like force field (L+ES (20)), respectively. The four datasets for this SE(3)-equivariant task were generated using the descriptions and source code of (Du et al., 2022), where each dataset is comprised of 7,000 total trajectories. Results are reported in terms of the mean squared error (MSE) between a method's node position predictions on the test dataset and the corresponding ground-truth node positions after 1,000 timesteps. Baseline comparison methods for this task include a collection of state-of-the-art GNNs, ENNs, and transformers.

Results and Discussion
The results shown in Table 1 reveal that, in operating on atom-level protein-ligand graph representations, GCPNET achieves the best performance for predicting protein-ligand binding affinity by a significant margin, notably improving performance across all metrics by 7% on average. Here, to the best of our knowledge, GCPNET is also the first method capable of achieving Pearson and Spearman binding affinity correlations greater than 0.6 on the PDBBind dataset (Wang et al., 2005) when employing a strict 30% sequence identity cutoff. Moreover, our ablations with GCPNET reveal that the design of our local frames, residual GCP Conveying a similar message, the results in Table 2 demonstrate that, in operating on atom-level protein graphs, GCP-NET also performs best against all other models for the task of estimating a 3D protein structure's quality (i.e., for PSR). In this setting, GCPNET improves performance across all local and global metrics by 2.5% on average. Once more, our ablations with GCPNET, in the context of PSR, reveal that the design of our local frames, ResGCP module, and scalar and vector feature channels can all be seen as beneficial for PSR prediction. We note that here, without access to scalar node and edge features, we were unable to produce results with GCPNET due to what appears to be a phenomenon of vector-wise latent variable collapse (Dieng et al., 2019), suggesting that, for PSR, the default GCPNET model relies strongly on the scalar-valued representations it produces.
The results shown in Table 3 indicate that GCPNET achieves the best results for two of the four NMS datasets considered in this work, where these two datasets are respectively the first and third most difficult NMS datasets for methods to model. Overall, GCPNET yields the lowest MSE averaged across all four NMS datasets, improving upon the state-ofthe-art 3D positional MSE for this task by 19% on average. Furthermore, besides our ablation of equivariant frames on the ES(5) dataset, our remaining ablations concerning the design of the default GCPNET model demonstrate that local frames, scalar information, and residual GCP modules synergistically enable GCPNET to achieve new state-of-the-art results for the NMS task. In summary, GCPNET improves upon the overall performance of all previous methods for both node-level (e.g., NMS) and graph-level (e.g., LBA) prediction tasks, verifying our method's ability to encode useful information for both scales of granularity.

Conclusion
In this work, we introduced GCPNET, a new state-of-the-art graph neural network for 3D molecular graph representation learning. We have demonstrated its expressiveness and utility through several benchmark studies that suggest that GCPNET is a powerful, general-purpose geometric deep learning method for 3D molecular data. Future work could involve research into developing variations of GCP-NET with improved runtime efficiencies or could include additional applications of GCPNET for various other scientific tasks and deep learning datasets. In particular, in future work, we aim to explore applications of GCPNET for generative modeling of small molecules as well as large biomolecular structures such as proteins.
Proof. Suppose the vector-valued features given to the corresponding GCPConv layers in GCPNET are node features χ i and edge features ξ ij that are O(3)-equivariant (i.e., 3D rotation and reflection-equivariant) by way of their construction. Additionally, suppose the scalar-valued features given to the respective GCPConv layers in GCPNET are E(3)-invariant (i.e., 3D rotation, reflection, and translation-invariant) node features h i and edge features e ij .
Translation equivariance. In line with (Du et al., 2022), the Centralize operation on Line 2 of Algorithm 1 first ensures that X 0 becomes 3D translation invariant by the following procedure. Let X(t) = (x 1 (t), ..., x n (t)) represent a many-body system at time t, where the centroid of the system is defined as Note that in uniformly translating the position of the system by a vector v, we have X(t) + v −→ C(t) + v, meaning that the centroid of the system translates in the same manner as the system itself. However, note that if at time t = 0 we recenter the origin of X to its centroid, we have which implies the system X is translation-invariant under the centralized reference X(t)−C(0) when the translation vector v is applied to X at time t = 0. Concretely, in the case of translation-invariant tasks such as predicting molecular properties or classifying point clouds, here we have successfully achieved 3D translation invariance. Moreover, for translation-equivariant tasks such as forecasting the positions of a many-body system, we can achieve translation equivariance by simply adding C(0) back to the predicted positions. Therefore, using the above methodology, GCPNETS are translation equivariant.
Permutation equivariance. Succinctly, we note that since GCPNET operates on graph-structured input data, permutation equivariance is guaranteed by design. For further discussion of why our proposed method as well as why other graph-based algorithms proposed previously are inherently permutation-equivariant, we refer readers to (Zaheer et al., 2017). Therefore, GCPNETS are permutation-equivariant.
Define our frame encodings as where we have The proof that F t ij is equivariant under SO(3) transformations of its input space is included in (Du et al., 2022). However, for completeness, we include a version of it here.
Let g ∈ SO(3) be an action under which the positions in X transform equivariantly, and F t ij be defined as we have it in Equation 20 above. That is, we have where from the definition of a t ij in Equation 21 we have Considering b t ij , from Equation 21 we have where using g −1 = g T for the orthogonal matrix g gives us Equation 22. Consequently, b t ij g − → gb t ij . Lastly, by applying Equation 22 once again, we have that c t ij g − → gc t ij . Moreover, note that under reflections of x, we have R : x → −x which gives us a t ij → −a t ij . Thereafter, by the right-hand rule, the cross product of two equivariant vectors gives us a pseudo-vector Consequently, we have det(−a t ij , b t ij , −c t ij ) = 1, informing us that the frame encodings F t ij are not reflection-equivariant (a symmetry that is important to not enforce when learning representations of chiral molecules such as proteins). Therefore, the frame encodings within GCPNET are SO(3)-equivariant.
Note, after the construction of these frames, that they are used on Line 4 of Algorithm 1 to embed all node and edge features (i.e., h i , e ij , χ i , and ξ ij ) using a single GCP module as well as in all subsequent GCP modules. We will now prove that the feature updates each GCP module makes with the frame encodings F t ij defined in Equation 20 are SO(3)-equivariant. SO(3)-equivariant GCP module. The operations of a GCP module are illustrated in Figure 3 and derived in Section 4.1. Their SO(3) invariance for scalar feature updates and SO(3) equivariance for vector-valued feature updates is proven as follows.
Following the proof of O(3) equivariance for the GVP module in (Jing et al., 2020), the proof of SO(3) equivariance within the GCP module is similar, with the following modifications. Within the GCP module, the vector-valued features (processed separately for nodes and edges) are fed not only through a bottleneck block comprised of downward and upward projection matrices D z and U z but are also fed into a dedicated downward projection matrix D S . The output of matrix multiplication between O(3)-equivariant vector features and D S yields O(3)-equivariant vector features v i S that are used as unique inputs for an SO(3)-invariant scalarization operation. In particular, the following demonstrates the invariance of our design for matrix multiplication with our GCP module's projection matrices (e.g., D S ). Suppose W h ∈ R h×v , V ∈ R v×3 , and Q ∈ SO(3) ∈ R 3×3 . In line with (Jing et al., 2020)

Specifically, our SO(3)-invariant scalarization operation is defined as
where F t ij = (a t ij , b t ij , c t ij ) denotes the SO(3)-equivariant frame encodings defined in Equations 20 and 21. To prove that Equation 23 yields SO(3)-invariant scalar features, let g ∈ SO(3) be an arbitrary orthogonal transformation. Then we have v i S → gv i S , and similarly F t ij = (a t ij , b t ij , c t ij ) → (ga t ij , gb t ij , gc t ij ). Now, similar to (Du et al., 2022), we can derive that Equation 23 becomes where we used the fact that g T g = I due to the orthogonality of g (with I being the identity matrix). Therefore, the scalarization operation proposed in Equation 23 As in Section 4.2.1, we now turn to discuss the operations within a single GCPConv layer, in particular proving that they maintain the respective SO(3) invariance and SO (3)  Thereby, so are features n l i = (h l i , χ l i ), given that the proof of equivariance for the equivariant LayerNorm and Dropout operations employed within each GCPConv has previously been concretized by (Jing et al., 2020). Equation 17 concludes the operations of a single GCPConv layer by, as desired, updating the positions of each node i in the 3D input graph. To do so, GCPConv residually updates current node positions x l−1 i using SO(3)-equivariant vector-valued features χ l pi . Therefore, GCPConv layers are SO(3)-invariant for scalar feature updates and SO(3)-equivariant for vector-valued node position and feature updates. SE(3)-equivariant GCPNet. Lastly, as desired, Line 10 of Algorithm 1 adds C(0) back to the predicted node positions X l as provided by each GCPConv layer, ultimately imbuing position updates within X l with SE(3) equivariance. Line 14 then concludes GCPNET by using the latest frame encodings F t ij to perform, as desired, a final SO(3)-invariant and SO(3)-equivariant projection for scalar and vector-valued features, respectively. Therefore, as desired, GCPNETS are SE(3)-invariant for scalar feature updates, SE(3)-equivariant for vector-valued node position and feature updates, and, as a consequence, satisfy the constraint proposed in Def. 3.1.

A.2. Proof of Proposition 2.
Proof. The proof of SE(3) invariance for scalar node and edge features, h i and e ij , follows as a corollary of Appendix A.1 (SE(3)-equivariant GCPNet). Therefore, GCPNETS are SE(3)-invariant concerning their predicted scalar node and edge features and, as a consequence, are geometrically self-consistent according to the constraint in Def. 3.2.

A.3. Proof of Proposition 3.
Proof. Suppose that GCPNET designates its local geometric representation for layer t to be F , and c t ij = a t ij × b t ij , respectively. As in (Du et al., 2022), this formulation of F t ij is proven in Appendix A.1 (SO(3)-equivariant frames) to be an SO(3)-equivariant local orthonormal basis at the tangent space of x t i and is thereby geometrically complete. Note this implies that GCPNET permits no loss of geometric information as discussed in Appendix A.5 of (Du et al., 2022). Therefore, GCPNETS are geometry-complete and satisfy the constraint proposed in Def. 3.3.

A.4. Additional Experiments and Results.
In this section, we explore an additional modeling task, computational protein design, with its implementation details being discussed in Appendix A.5.
CPD, Node Classification. Computational protein design (CPD) investigates a method's ability to design native-like protein sequences. In our CPD experiments, we explore fixed-backbone sequence design, where methods are provided with the 3D backbone structure of a protein and asked to generate a corresponding sequence. We train and evaluate each CPD method on the CATH 4.2 dataset created by (Ingraham et al., 2019). This dataset contains 18,204, 608, and 1,120 training, validation, and test proteins, respectively, where all available protein structures with 40% nonredundancy are partitioned by their CATH (class, architecture, topology/fold, homologous superfamily) classification. Baseline comparison methods for this task include a mixture of state-of-the-art transformers, GNNs, and ENNs.
Under the assumption that native sequences are optimized for their structures (Kuhlman & Baker, 2000), the metrics with which we evaluate each method measure how well a method can distinguish a native-like sequence from a non-native sequence. In particular, following (Ingraham et al., 2019), we adopt model perplexity as a measure of how well a method can model the language of native protein sequences. Similarly, we employ native sequence recovery (i.e., amino acid recovery) rates as a way of evaluating, on average, how well each method can design sequences that resemble native protein sequences. Table 4 shows that, in representing proteins as amino acid residue-level graphs, GCPNET matches or exceeds the performance of several state-of-the-art prediction methods for CPD. In particular, GCPNET improves upon state-of-the-art short sequence recovery rates of previous methods by 0.5% on average while maintaining competitive performance against other methods in all other metrics. We note that all CPD methods marked with * perform model inference autoregressively, introducing a Table 4: Comparison of GCPNET with baseline methods for the CPD task. Results are reported in terms of the perplexity and amino acid recovery rates of each method for fixed-backbone sequence design. The best results for this task are in bold, and the second-best results are underlined. significant computational bottleneck for real-world applications of these models. Inference with GCPNET, in contrast, is designed for direct prediction of amino acid sequences corresponding to a 3D protein structure, thereby decreasing inference runtime by more than a factor of two compared to other methods. While being a simple direct-shot prediction method for CPD, GCPNET is still able to achieve competitive results in terms of amino acid recovery rates for sequence generation, with reasonable results in terms of perplexity as well.
Interestingly, in the context of CPD, an ablation of our equivariant local frames F ij reveals that such frames are not useful for increasing GCPNET's confidence in its structural understanding of the language of proteins (i.e., its perplexity). This suggests that future work could involve exploring alternative geometric encoding schemes for residue-based graphs when approaching the CPD task (Gao et al., 2022) with GCPNET. This finding highlights the fact that the local frames F ij appear to be most useful in the context of representation learning on atomic graphs where lower-level molecular motifs are likely to appear, implying that future work towards improving CPD results with GCPNET could involve developing novel atom-level encoding schemes for residue-based graph predictions to leverage the promising results GCPNET yields in other dataset contexts. Nonetheless, our remaining ablations demonstrate that other design characteristics of GCPNET such as the ResGCP module and scalar and vector-valued feature representations enable GCPNET to better decode sequence-based information from 3D protein structures.
Featurization. For the LBA and PSR tasks, in each 3D input graph, we include as a scalar node feature an atom's type using a 9-dimensional one-hot encoding vector for each atom. As vector-valued node features, we include forward and reverse unit vectors in the direction of x i+1 − x i and x i−1 − x i , respectively (i.e., the node's 3D orientation). For the input 3D graphs' scalar edge features, we encode the distance x i − x j 2 using Gaussian radial basis functions, where we use 16 radial basis functions with centers evenly distributed between 0 and 20 units (e.g., Angstrom). For the graphs' vector-valued edge features, we encode the unit vector in the direction of x i − x j (i.e., pairwise atom position displacements).
For the CPD task, in each 3D input graph, we include as scalar node features an encoding of each amino acid residue's dihedral angles {sin, cos} • {φ, ψ, ω}, where φ, ψ, and ω are the dihedral angles computed from the corresponding protein's C i−1 , N i , C i , and N i+1 backbone atoms. We then include as vector-valued node features each node's 3D orientation. For edge features, we use Gaussian radial basis function distance encodings as scalar edge features and pairwise atom position displacements as vector-valued edge features.
For the NMS task, in each 3D input graph, we include as a scalar node feature an invariant encoding of each node's velocity vector, namely v 2 i . Each node's velocity and orientation are encoded as vector-valued node features. Scalar edge features are represented as Gaussian radial basis distance encodings as well as the product of the charges in each node pair (i.e., c i c j ). Lastly, vector-valued edge features are represented as pairwise atom position displacements.
Hardware Used. The Oak Ridge Leadership Facility (OLCF) at the Oak Ridge National Laboratory (ORNL) is an open science computing facility that supports HPC research. The OLCF houses the Summit compute cluster. Summit, launched in   2018, delivers 8 times the computational performance of Titan's 18,688 nodes, using only 4,608 nodes. Like Titan, Summit has a hybrid architecture, and each node contains multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected with NVIDIA's high-speed NVLink. Each node has over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes are connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect. We used the Summit compute cluster to train all our models. For the LBA and NMS tasks, we used 16GB NVIDIA Tesla V100 GPUs for model training, whereas for the memory-intensive PSR and CPD tasks, we used 32GB V100 GPUs instead. Lightning was used to facilitate model checkpointing, metrics reporting, and distributed data parallelism across 6 V100 GPUs. A more in-depth description of the software environment used to train and run inference with our models is available at https://github.com/BioinfoMachineLearning/GCPNet.
Hyperparameters. We use a learning rate of 10 −4 for all GCPNET models. The learning rate is kept constant throughout each model's training. For the NMS task, each model is trained for a minimum of 100 epochs and a maximum of 12,000 epochs. For all other tasks, each model is trained for a minimum of 100 epochs and a maximum of 1,000 epochs. For a given task, models with the best loss on the corresponding validation data split are then tested on the test split for the respective task.