MAgNET: A Graph U-Net Architecture for Mesh-Based Simulations

In many cutting-edge applications, high-fidelity computational models prove to be too slow for practical use and are therefore replaced by much faster surrogate models. Recently, deep learning techniques have increasingly been utilized to accelerate such predictions. To enable learning on large-dimensional and complex data, specific neural network architectures have been developed, including convolutional and graph neural networks. In this work, we present a novel encoder-decoder geometric deep learning framework called MAgNET, which extends the well-known convolutional neural networks to accommodate arbitrary graph-structured data. MAgNET consists of innovative Multichannel Aggregation (MAg) layers and graph pooling/unpooling layers, forming a graph U-Net architecture that is analogous to convolutional U-Nets. We demonstrate the predictive capabilities of MAgNET in surrogate modeling for non-linear finite element simulations in the mechanics of solids.


Introduction
Computational models are essential tools for studying, designing, and controlling complex systems in many fields, including engineering, physics, biology, economics, and social networks.These models are often based on physical laws and mathematical equations, with partial differential equations (PDEs) being a common tool for describing how quantities change over space and time.In mechanics and physics, the PDEs are most commonly solved with numerical methods upon earlier space-and time-discretization, and a large number of domain-specific computational models have been developed so far, with the finite element method (FEM) and the finite volume method (FVM) being the most commonly used approaches in solid-and fluid mechanics, respectively.However, despite significant advances in computational performance over the last decade, such high-fidelity numerical simulations remain prohibitively expensive for many important applications, including emerging areas such as real-time feedback/control in the computer-assisted surgery [Johnsen et al., 2015;Bui et al., 2018] or soft robotics [Rus and Tolley, 2015;Goury and Duriez, 2018].Speeding up such models whilst maintaining the desired accuracy is an active area of research, and one of the main motivations of the present work.
Recently, deep learning (DL) techniques have taken a center stage across many disciplines.The DL models have proven to be accurate and efficient in predicting non-trivial nonlinear relationships in data.For that reason, they have been tried for a variety of applications also in mechanics, such as surrogate modelling [Mendizabal et al., 2019;Deshpande et al., 2022;Krokos et al., 2022b;Šarkić Glumac et al., 2023] or model discovery and calibration [Huang et al., 2020;Thakolkaran et al., 2022].The deep neural network approaches can be categorized with respect to how they use the data and a priori knowledge about the modelled system.In purely data-driven approaches, DL models rely on performing supervised learning on either experimental or numerically generated data and are agnostic to the underlying physics or model.As such, they are able to reproduce the physics-based relationship by implicitly learning on a relatively large amount of data [Runge et al., 2017;Aydin et al., 2019;Daniel et al., 2020;Kochkov et al., 2021].If the a priori information about the modelled system is introduced, such networks are termed as Physics Informed Neural Networks (PINNs) [Raissi et al., 2019;Samaniego et al., 2020;Henkes et al., 2022;Klein et al., 2022;Zhang et al., 2022].With respect to the purely data-driven approaches, PINNs are generally more accurate, require less data for training, and possess better generalization capabilities.The framework presented in the present work is generally applicable to both cases, however, for the sake of clarity, we will later only focus on purely data-driven types of networks.In any case, once trained, the DL models can be used as fast surrogates for computationally expensive high-fidelity numerical methods.
The focus of the present work is on high-dimensional relationships in which the sizes of inputs and/or outputs are large.Examples of such relationships can be found, for instance, in experimental full-field measurement data, such as our recent work on medical imaging [Lavigne et al., 2022], or in synthetic mesh data generated from finite element simulations, [Lorente et al., 2017;Pellicer-Valero et al., 2020].Although DL techniques have generally shown great success as efficient surrogates to computationally expensive numerical methods in scientific computing, some of the popular existing machine learning approaches are still based on fully-connected deep networks which are not suitable for high-dimensional inputs/outputs.As an alternative, the application of Convolutional Neural Networks (CNNs) has proven a promising performance in a wide variety of applications, also including accelerating non-linear finite element/volume simulations [Obiols-Sales et al., 2020;Rao and Liu, 2020;Krokos et al., 2022b;Deshpande et al., 2022].CNNs are designed to learn a set of fixed-size trainable local filters (convolutions), thus reducing the parameter space while being capable to capture non-linearities.In the context of computational mechanics, local convolutions leverage the natural local correlation of nearby nodes, which leads to more efficient neural network architectures, both in terms of training-and prediction times.Moreover, one can observe that the CNN architectures have a close analogy to some iterative solution schemes known in scientific computing [Wang et al., 2020a;Brenner and Scott, which is independent of data (i.e., node features).In the context of GNN-accelerated FEM simulations, a similar concept has been proposed by [Black and Najafi, 2022], however, their implementation is limited to regular meshes for simple two-dimensional geometries and linear elastic problems.Our approach enables computationally efficient deep learning models for non-linear problems involving arbitrary meshes, which is an important advancement for this field.
In summary, we introduce a novel graph U-Net framework comprising the proposed MAg and graph pooling/unpooling layers.The MAg layer captures local regularities in the input data, while the interleaved pooling layers reduce the graph representation to a smaller size while preserving important structural information.This enables us to efficiently implement our framework for large-scale problems.The proposed MAg and graph pooling layers are direct analogues of respective CNN U-Net layers and are also compatible with many state-of-the-art graph neural network layers.We elaborate on this point in the paper, providing a qualitative comparison of the proposed MAg layer with several existing graph layers.To validate the predictive capabilities of our framework, we apply it to several non-linear relationships obtained through finite element analysis and cross-validate it with predictions given by our CNN U-Net architecture [Deshpande et al., 2022].To increase the impact of our work, we provide source codes, datasets, and procedures to generate the datasets utilized in this work, which can be found in the MAgNET repository: https://github.com/saurabhdeshpande93/MAgNET.The paper is organized as follows.In Section 2 we present the novel MAgNET framework, as well as its particular application to the hyperelastic FEM-based datasets.Then, in Section 3, we provide details of implementation and a thorough study of MAgNET for several 2D and 3D benchmark non-linear FEM examples.The conclusions and future research directions are summarised in Section 4.

MAgNET Deep Learning Framework
In this section, we will propose a novel graph-based encoder-decoder (U-Net) deep-learning framework.We will provide a general formulation, in which inputs and outputs follow a certain graph topology (that is expressed by an adjacency matrix A).The graph expresses an assumed structure of correlations within input/output data and allows us to devise a robust DNN architecture defining a non-linear mapping between inputs and outputs.We will apply this general framework to mesh-based graphs.Such mesh topology of data is characteristic to spatially discretised numerical solution schemes for PDEs in scientific computing.In particular, we will focus on hyperelastic problems in solid mechanics, for which the training/testing data is obtained through the finite element method (see also the schematics in Figure 1).
In Section 2.1 we will provide an overview of the proposed Graph U-Net framework MAg-NET.Next, in Sections 2.2-2.4 we will introduce the building blocks of MAgNET.In particular, in Section 2.2, we will introduce the adjacency matrix representation of the mesh-based graph, which will be utilised later in this paper; and in Sections 2.3-2.4 we will specify a new graph Multi-channel Aggregation (MAg) layer, as well as new graph pooling/unpooling layers.Afterwards, in Section 2.5, we will provide an information-passing interpretation of the proposed Graph U-Net architecture.Finally, in Section 2.6 we will introduce a particular application of the framework to mesh-based datasets that are generated from FEM solutions of problems in hyper-elasticity.

MAgNET architecture overview
The MAgNET graph neural network architecture can be classified as a graph U-Net network and is an extension to the well-known class of convolution-based U-Net architectures (see [Ronneberger et al., 2015]).As such, the graph U-Net comprises of aggregation ('convolution'), pooling, unpooling, and concatenation layers (see the schematics in Figure 2), which are here suitably adjusted to work with general (non-grid) topologies of inputs/outputs.
The graph U-Net architecture has two stages: encoding and decoding.In the encoding stage, first, we apply one or more aggregation (MAg) layers, which are analogues of convolution layers in non-graph U-Net networks.Next, we apply a single graph pooling layer, which is a particular contraction of the graph, and which downsamples (coarsens) the problem.This aggregation-pooling sequence is repeated several times to achieve the desired level of contraction (coarsening).At the coarsest level, the MAg aggregation is performed one or more times, after which the decoding stage begins, which is the opposite to the encoding stage.At each level of decoding, the graph unpooling layer is followed by one or more MAg layers.At the upmost level, the last MAg layer is applied with linear activation to get the output.
19 More formally, the Graph U-Net network, G, is constructed as follows.First, we set the input layer d 0 as a vector of N nodes, each of which being a vector of input values (also known as features or channels) of a constant length c 0 .(Further on, we will refer to the features as the channels.)Next, we subsequently add layers, d l , to form a U-Net architecture.The subsequent layers, d l−1 and d l , are linked by the following relationship where θ l is a vector of trainable parameters (e.g., weights and biases, θ l = k l ∪b l ), and T l (•) is one of three already introduced transformations: MAg(), gPool() or gUnpool(), which will be more precisely defined in the following sections.Additionally, we also consider remote concatenation links between respective layers from the encoding and decoding stages, see Fig. 2. The output layer, d L , is assumed to be of the same mesh format as the input layer but can have a different number of channels (features), c L .Finally, we define the Graph U-Net network as a parameterized transformation where θ = L l=1 {θ l } is a concatenated vector of network parameters.
The calibration of the Graph U-Net parameters is done through a supervised learning, by fitting a given known input-output training dataset.The training dataset, is in the mesh format, and the training is done by minimizing the following mean squared error loss function which gives the optimal parameters θ * = arg min θ L(D tr , θ). (5) 2.2.Adjacency matrix of the mesh-based graph For the purpose of this work, we will focus on sparse graphs that derive from data that is spatially organised in the form of meshes.Those can be 1D, 2D, or higher-dimensional meshes, of an arbitrary connection topology (see Fig. 3 for examples of 2D meshes).The graph can be conveniently represented by a symmetric, square, Boolean adjacency matrix, A, whose order is equal to the number of nodes in the original mesh.To simplify the further notation, all nodes (vertices) are self-connected (have loops), which results in having 1 on the diagonal of A. This allows us to more easily express certain graph operations that are used in this work, for instance, the k-th power of a graph A, and the selection of pooling sub-graphs that is presented in Section 2.4.
It is fairly straightforward to generate an adjacency matrix from an element connectivity matrix of the mesh.For that reason we will not discuss it in detail.The only point to be emphasized is that we make all nodes belonging to a given element mutually interconnected in the resulting graph (as they can be assumed to be strongly inter-related).
We can visually represent it by adding more links as compared to a standard wire-frame visualization of finite-elements (see, e.g., the dashed lines in Fig. 3a).
Remark: In our work we do not consider any attributes for the edges of a graph.Therefore, the data is only represented through nodal features and node-node connections which are defined through the adjacency matrix A.

Multi-channel Aggregation (MAg) layer
The proposed novel neural network layer, MAg, is a multi-channel local aggregation layer that can operate on graph-structured data.Its architecture is a direct extension to the standard convolutional layer in CNNs, in which a shareable convolution window is used, making CNNs restricted to grid-structured data.In the MAg layer, instead, we propose to use fully-trainable local weighted aggregations (the so-called message passing scheme), where the aggregation neighborhood of a given node is prescribed through the graph connectivity (the adjacency matrix).As such, the scheme is very well suited for sparse graphs and can be directly applied to graphs that derive from arbitrary 2D or 3D meshes.
The use of multiple channels aims to improve the capabilities of the network to capture nonlinearities.In the multi-channel version, each node represents a vector of values (features), which can be visualised as multiple layers (channels) of the same graph structure (see the schematics in Figure 4a).The transformation between the input-and output multichannel graphs is realised by applying multiple MAg aggregations on vector data to produce respective multiple components of output vectors.Note that the input/output channels of the whole network have usually a certain meaning, and their sizes are fixed (e.g., three RGB channels of a color image at the input and a single channel of a segmented image at the output).The number of channels in the latent layers can be chosen arbitrarily, which is up to the choice of a designer of a particular graph U-Net architecture.More formally, we will consider the MAg layer as a parameterized transformation between the input and output nodes, defined as where is a set of neighbours of a node i to be aggregated, α and β represent the output and input channels, respectively, while k l+1 and b l+1 are trainable weights and biases, respectively.In this multi-channel definition, for a given component, d l+1 i,α , of an output, a single aggregation is performed throughout the neighborhood, N i , and all the input channels, β ∈ {1, . . ., c l }.The kernel parameters of MAg transformation, k l+1 i,j,α,β , are not shared, i.e, they can be independently trained for each aggregation window (note the free indexes i and α).

Comparison to existing graph aggregation/convolution layers
As already mentioned in the introduction, the very idea of generalisation of convolution layers to arbitrary graph structure is not new.In fact, various concepts have emerged so far, [Zhou et al., 2020][Chen et al., 2020], most of which are compatible with the U-Net framework proposed in the present work.Below, we will discuss several of them, introducing a unified notation that will facilitate a qualitative comparison with respect to the proposed MAg layer, see Table 1.
Graph neural network layers aim to utilize the information about assumed correlations in data, with the graph structure expressing those correlations.The general approach is to specify a suitable (possibly nonlinear) trainable local transformation that can aggregate the information from a node in consideration and its neighbours.(This aggregation is followed by a chosen activation function before being propagated to the next layer.)Such transformations form a wide class of, so-called, message passing schemes, and can combine shareable (independent of a node) and non-shareable (dependent on a node, i.e., independently-trainable) sets of parameters.
The simplest and most lightweight realisations of the graph aggregation/convolution layer concept only utilise shareable weights, see, e.g., the Graph Convolutional Network (GCN), [Kipf and Welling, 2016].In those approaches, a non-trainable (arbitrary) weighted aggregation is performed prior to application of a shareable trainable operator -something completely opposite to our MAg layer, which is fully trainable.This enables to keep the number of trainable parameters low, which is achieved at the cost of relatively low capacity of such networks.This low capacity can not be straightforwardly increased by simply deepening the network because of the well-known over-smoothing phenomenon.
We will discuss two out of many available approaches to increase the capacity of graph neural networks.The first approach relies on the multi-head attention mechanism which allows to assign different importance to nodes in the neighbourhood, see, e.g, the Graph Attention Network (GAT), [Veličković et al., 2017].In the attention mechanism, the weights used in local aggregation depend on input nodal features, which makes the concept qualita-tively different from all approaches (including the MAg layer) that use input-independent aggregation weights.The second class of approaches resemble the MAg layer more closely.
Particular notable examples of that approach are the Spatial-Temporal Graph Convolution Network (ST-GCN), [Yan et al., 2018], and the Semantic Graph Convolution Network (SemGCN), [Zhao et al., 2019], which have been introduced in the particular context of human pose recognition problem (the computer vision domain).The common features of MAg and SemGCN layers are the input-independent learnable weighted aggregation and the use of channels to increase the model capacity.The difference is that the MAg doesn't use a shared transformation matrix (w) nor the softmax normalisation -both used in the case of SemGCN.
To summarize, the proposed MAg layer relies on one of the most flexible message-passing schemes, with no shareable parameters.This promises a very high capacity of the MAg network.Also, as shown above, the proposed MAg layer is compatible with other graph convolution/aggregation layer concepts, and thus can be straightforwardly exchanged, if needed.

Graph pooling-and unpooling layers
Pooling and unpooling are two fundamental operations allowing U-Nets to encode (compress) and decode (decompress) information, respectively, see Fig. 2. The pooling layers are composed of local contracting operations over the mesh-structured data, and are used to coarsen the data at the encoding part of the network.At the decoding part, the original refined mesh structure is restored by the unpooling layers (upsampling operations).In U-Nets, the unpooling layer is usually combined with the concatenation operation, which creates a direct link between the encoding and decoding part of the network (this will be explained later).

Graph pooling
In this work, we propose a novel clustering-based graph pooling layer that can be applied to arbitrary graph-structured data.It can be seen as an extension to the pooling layers known from CNN U-Nets that are limited to grid-structured data.In our graph pooling approach, we split the graph into disjoint cliques (fully-connected subgraphs), and perform the contraction of all the identified cliques (i.e., every clique is replaced by a vertex, and new edges represent formerly connected cliques), see Figure 5.The split into cliques is done statically, i.e., at the graph U-Net construction phase.In particular, the split does not depend on the input data.
Below, we will provide a more formal construction of the pooling layer.For a given input graph G that is represented by the vertices S and the connectivity matrix A, we first generate an arbitrary set of Ñ non-overlapping fully-connected subgraphs where the sets S i represent nodes of the respective subgraphs G i .The procedure to generate these subgraphs and the respective pooled adjacency matrix Ã is described in Algorithm 1.
The pooled graph G is composed of vertexes S = {1, . . ., Ñ } with edges defined by the adjacency matrix Ã.The pooling layer is described as: where the 'aggr' operation can be a max/min/avg, etc.Note that graph pooling layers do not modify the number of channels, i.e., the pooling is performed individually per each channel of the input.
Graph pooling can be applied several times at the encoding part of the U-Net, e.g., see Fig. 2. For the purpose of future unpooling operations, after each pooling operation, we save the original graph, G, the adjacency matrix, A, and the pooling subgraphs, G i .After doing so, we substitute G ← G, and A ← Ã.

Graph unpooling
Structure-wise, the graph unpooling is a reverse operation to pooling.More precisely, the output graph of an unpooling layer will have the same topology as the input graph of the related pooling layer, see Fig. 5.The operation is defined via the previously saved subgraphs G j (with nodes S j ) as and it simply replicates the features of a node j to the nodes specified by S j .As such, this operation is analogous to the related upsampling operation used in CNNs.

Graph unpooling + concatenation
Concatenations, also known as skipped connections, are characteristic to U-Net architectures.Thanks to them, the layers from the decoder part gain a direct access to features from the encoder part.Concatenations help to mitigate the issue of vanishing gradients, and add extra information that could have been lost due to the earlier downsampling (pooling).
In our case, the concatenations are always related to the respective pooling/unpooling operation pairs, see Fig.
/* number of pooled nodes = number of pooling subgraphs */ Ã ← zeros( Ñ , Ñ ) /* zero initialisation of pooled matrix */ /* Loop for constructing pooled adjacency matrix Ã from subgraphs S */ for r in {1, 2, .., and S[c] are connected by an edge */ end end end by Equation ( 9), with the input of a respective pooling layer l : In the formula above, c l is the number of channels in unpooling inputs.As the result, the total number of output channels of unpooling+concatenation is c l + c l .

Information-passing interpretation of MAg and pooling layers
During a single forward pass of the MAg layer, the aggregation is performed locally for each individual node, i.e., each node of the graph will have an access to the aggregated feature information from its adjacent nodes only, specified by the adjacency matrix, A, see Equation ( 6).Therefore, the nodes that are not directly connected through A do not exchange information at a single MAg operation (see Figure 6).Such long-distance exchange across the network is fundamental to allow the neural network model to express correlations between topologically distant input-and output nodes (e.g., how the output displacements at node C depend on the input loads at the node B, in Figure 6).
One way to handle this issue would be to apply the MAg layer several times as shown in Figure 6.However, in that case, the number of subsequent layers would be proportional to the diameter of the underlying graph, which could increase the number of training variables and the depth of the network, deteriorating its performance.A natural simple improvement, also utilised in the present paper, is to increase the support (neighbourhood) of the MAg operations.In the proposed framework, this can be straightforwardly done by using higher powers of the adjacency matrix, e.g., A 2 or A 3 , instead of A. This improvement alone, however, would still require the number of MAg layers to be proportional to the graph diameter.The above observations explain a natural motivation behind using the pooling/unpooling layers, and hence creating the U-Net architecture.The pooled graph can be seen as a reduced space representation of the parent graph, and each pooled node aggregates the feature information corresponding to multiple nodes of the parent graph, see Figure 7.The pooled graph is of a coarsened topology when compared to the parent graph, and this allows for the feature information exchange with a lower number of MAg layers.The pooling/unpooling layers can be nested, which provides an exponential reduction rate of the graphs' diameters.
Remark: Note that the pooling layer proposed in this work represents a clique-pooling approach in which the cliques are non-overlapping.This allows us to achieve a very good level of graph coarsening (contraction).It is unlike a similar clique-based strategy that has been recently proposed, [Luzhnica et al., 2019], in which the pooling cliques overlap, providing a much lower level of coarsening.

Application to FEM-based datasets
We will now focus on a particular graph structure of inputs/outputs that will be in a form of finite element mesh.Specifically, the MAgNET framework will be applied as a surrogate model to a finite element model in large-deformation elasticity.The finite element model will be shortly introduced below.
We consider a boundary value problem expressed in the weak form over the domain Ω: where P (•) is the first Piola-Kirchhoff stress tensor, b are prescribed body forces, t are prescribed tractions on the Neumann's boundary Γ N , while the solution u and the variation δu belong to appropriate functional spaces, with u = ū and δu = 0 on the Dirichlet boundary Γ u .The hyperelastic constitutive relationship is expressed through the strainenergy density potential W (F ) as where the deformation gradient tensor F = I + ∇u.
For all the cases considered in the present work, the Neo-Hookean hyperelastic law with the following strain energy potential is used, see Simo and Taylor [1982], where the invariants J and I c are given in terms of deformation gradient F as with µ and λ being Lame's parameters, which can be expressed in terms of Young's modulus, E, and Poisson's ratio, ν, as Note that one can use other forms of the volumetric part of the above potential, see Doll and Schweizerhof [2000], or other hyperelastic models, such as the Mooney-Rivlin and a more general class of Ogden models, see Ogden [2005].
Finite element discretisation transforms the weak form expressed by Eq.(11) into the system of non-linear equations that expresses the balance between external and internal nodal forces.In this work, the vector of external forces, f ext , represents boundary conditions, which can be surface tractions or body forces.Given f ext = f m , the system is solved for an unknown vector u with the Newton-Raphson scheme, giving as a result the solution u m .A pair (f m , u m ) makes an element of the dataset D introduced in Eq. ( 3), and the FE mesh that results from the FE discretization produces the adjacency matrix A introduced in Section 2.2.

Results
In this section, we will study the performance of the proposed framework, and for that purpose, we use four benchmark problems.In Section 3.1, we give a detailed specification of the benchmark problems and the procedure for obtaining FEM-based datasets.In Section 3.2, we provide details of neural network architectures for each of the studied cases and will describe the training procedure.In Section 3.3, we study the predictions of neural network models by cross-validating results from MAgNET and CNN models, and by comparing them with the FEM results.Finally, in Section 3.4 we demonstrate the capabilities of the MAgNET framework to provide a surrogate model for the unstructured mesh cases.

Generation of FEM based datasets
We consider four benchmark problems, see Fig. 8. Two of them, Fig. 8(a-b), utilise simple meshes, which makes it possible to assure structured (grid) inputs.They will be used to cross-validate between our MAgNET architecture and the standard CNN U-Net architecture.The other two examples, Fig. 8(c-d), are more complex and will serve us to demonstrate the applicability of MAgNET for general (unstructured) meshes.Each of those two groups consists of a 2D and a 3D problem, thanks to which the framework can be tested for four different finite element topologies: triangular, quadrilateral, tetrahedral, and hexahedral.
For all considered cases, we use the neo-Hookean material model, see Section 2.6, with material parameters provided in the Table 2.In order to generate training/testing datasets, for each discretized problem we individually specify a family of boundary conditions, as described below, see also schematics in Figure 8.For the cases shown in Figures 8(a-c), nodes on one side are fixed (Dirichlet boundary conditions), and only a single random nodal force is applied at a selected node in a prescribed region of interest (denoted by red line/surface in Figure 8(a-c)).For the remaining nodes, the external forces are set to 0. For  All the finite element computations were implemented and performed within the Ace-Gen/AceFem framework [Korelc, 2002].For a given problem, for each loading case, i, the entire vector f (i) of external nodal forces and the vector u (i) of computed nodal displacements were saved, which allowed to generate the final training/testing dataset D = {(f (1) , u (1) ), ..., (f (Mtr+Mte) , u (Mtr+Mte) )}.The datasets were randomly split into training sets, M tr (95%), and testing sets, M te (5%).The sizes of datasets and the distribution of force magnitudes are provided in Table 3.

Design, implementation and training of neural network models
The implementation of the layers and mechanisms of the MAgNET framework described in Section 2 and of CNN U-Net framework introduced in [Deshpande et al., 2022] has been performed within the TensorFlow libraries.We use them to build and train deep neural network models for the cases described in Section 3.1.To provide a complete understanding of the neural network architectures listed in Table 4, we will now delve into the details of the MAgNET architecture used for the 2D L-shape example.Its schematics is shown in Figure 9.As indicated in the third column in Table 4, it is a three-level graph U-Net architecture with two MAg operations at each level.The fourth column specifies the number of channels utilized for the MAg operations at each level of the graph U-Net.The forward pass starts with the input mesh (2D), to which the MAg layer is applied twice (with 16 output channels).This is followed by the graph pooling layer, which coarsens the mesh and transitions to the next level of the U-Net (from zeroth to the first level).This process repeats twice, with the first and second levels of the graph U-Net having MAg layers with 32 and 64 output channels, respectively, leading to the coarsest third level of the U-Net.At this level, two MAg layers (with 128 output channels) are applied.In  the subsequent decoding phase, the graph unpooling layer is employed with the concurrent concatenation operation, and followed by two MAg operations (with 64 output channels).This upsampling sequence repeats twice with the use of 32-and 16-channel MAg layers.Finally, a single MAg layer (with 2 output channels) is applied, using a linear activation to produce the desired output mesh (the displacement mesh must have the same structure as the input mesh of forces).It is worth noting that analogous architectures of CNN U-Net networks are similar, with the only distinction being the use of convolution layers in place of MAg layers and CNN U-Net max poolings instead of graph poolings.
As demonstrated in Table 4, both 2D L-shape and 3D beam examples have been modeled utilizing both MAgNET and CNN U-Net architectures.These networks were designed to have a similar number of trainable parameters, thus facilitating a fair comparison of their fitting capabilities.The number of parameters in both types of networks is controlled by having a higher number of channels in the CNN U-Net architecture compared to its corresponding MAgNET architecture.This difference in the number of channels is attributed to the convolution operators in the CNN architecture sharing parameters across a layer, which may necessitate a larger number of channels to ensure an optimal fit, while the aggregation operators in the MAg layer use individual weights per aggregation window, allowing for more flexible fitting across the mesh with a smaller number of channels.However, caution must be exercised when selecting the number of channels, as setting it too low can result in increased prediction errors (as seen in Figure 15).The number of neural network levels (pooling operations) and the size of convolution/aggregation windows have been adjusted on a case-by-case basis to obtain the desired fitting capabilities while keeping the number of trainable parameters low and comparable between the respective CNN U-Net and MAgNET models.The fitting capabilities heavily depend on the successful propagation of information from the input throughout the network.This can be compromised when the number of poolings or the window size is too small, as explained in Section 2.5.For this reason, a larger number of pooling operations is used for mesh graphs with larger diameters (e.g., the 3D cases in Table 4).Additionally, in the case of MAgNET models, a global optimization of graph pooling operations is performed to reduce the number of nodes at the coarsest level.In this optimization, Algorithm 1 is run 1000 times with different random seeds, and the case with the least number of nodes at the lowest level is selected.
Remark: Note that the example presented in Fig. 8(a) utilizes a non-structured mesh.As such, it can not be directly used by the CNN U-Net model, and an additional preprocessing step needs to be done to make the input and output meshes structured.In this case, we apply zero padding to convert the L-shape mesh into a structured mesh, see Figure 10, which is then used for training with the CNN U-Net architecture.We do not need to do this preprocessing step for the MAgNET architecture.The models presented in Table 4 were trained by minimizing the loss function, as described in Equation 4, using the datasets introduced in Section 3.1.The Adam optimizer, an extension of the stochastic gradient descent algorithm, was used for this purpose.A minibatch size of 4 and an initial learning rate of 1 × 10 −4 , with a linear decay to 1 × 10 −6 during training, were employed.The number of epochs (i.e., iterations of the Adam optimizer) was manually tailored on a case-by-case basis to achieve low values of the loss function.An example of the training loss is provided in Figure 11.The network trainings were conducted using TensorFlow on a Tesla V100-SXM2 GPU at the HPC facilities of the University of Luxembourg, see [Varrette et al., 2014].

Cross validation of CNN U-Net and MAgNET predictions
We are going to compare the predictions of MAgNET and CNN U-Net models for two problems with structured inputs/outputs that were introduced in Figure 8(a,b).Let us look at the individual examples with the highest nodal displacement magnitudes.In the 2D L-shape example, shown in Figure 12a, MAgNET predictions visually coincide with the reference FEM solution very well.This is quantitatively shown in Figures 12b and 12c, where the the L 2 error field is presented for MAgNET and CNN U-Net, respectively, demonstrating low level of errors for both models.A similar tendency can be observed in the 3D beam case shown in Figure 13.
Here, although the level of errors is relatively a bit higher than in the 2D example, the MAgNET and CNN U-Net perform similarly, which proves good capabilities of the proposed MAgNET model as compared to the CNN U-Net model.The relative error for the corner node displacement using MAgNET is 0.5% (c) L2 error of nodal displacements between CNN U-Net and FEM solution.The relative error for the corner node displacement using CNN is 0.3%.
In the following, we will analyze and compare the performance of both models for all cases in the text datasets.For that purpose, we need aggregated error metrics.As an error metric for a single test example, we use the mean absolute error, where the force-displacement pair (f m , u m ) is an element of the test dataset and F is the number of dofs of the mesh.The metric e m gives us the notion of error between an expected finite element solution, u m , and the prediction of the neural network, G(f m ).To analyze the overall quality of fitting, we define a single error metric over the entire test set as the average mean absolute error with the corrected sample standard deviation (standard deviation of averaged errors) defined as Finally, in addition to that, we also use the maximum error per degree of freedom over the entire test set e max = max The aggregated error metrics for the entire test sets of structured mesh examples obtained using MAgNET and CNN approaches are summarised in Table 5.The first observation is that both MAgNET and CNN models exhibit similar prediction accuracy, demonstrating that the MAgNET architecture can achieve a comparable predictive capacity to the CNN U-Net architecture for a similar number of trainable parameters.The prediction errors, with respect to a characteristic length of 1m, fall below 0.1% for the average mean absolute error (Equation ( 20)), which is a promising result given the presence of geometric and constitutive nonlinearities.Additionally, we analyze the performance of the MAgNET model as a function of the maximum nodal displacement per test example.This dependency is visualized in Figure 14  As explained in Section 2.3, one can modulate the model capacity to capture non-linearities in the underlying data by modulating the number of channels in MAg and CNN layers.Importantly, the convolution windows are shareable in CNN architectures, whereas the aggregation windows in MAg architectures are independent.To this end, we expect CNN networks to require more channels than their respective MAgNET networks to achieve the same level of accuracy.We used this fact when designing CNN and MAgNET architectures in Section 3.2.To verify this hypothesis and demonstrate this effect, we trained five MAgNET and five CNN U-Net networks on the L-shape dataset with different numbers of channels.In all analyzed cases, we used 4-level MAgNET and 3-level CNN U-Net architectures, with two MAg/Conv layers per level, and with a constant number of channels at all levels.Figure 15 shows that there is indeed a strong dependency of accuracy on the number of channels for both analyzed network architectures.For a comparable number of trainable parameters, CNN U-Nets can use more channels than their respective MAgNETs, providing them with comparable predictive accuracy.We can also observe that too few channels significantly reduce the fitting capabilities of both networks, with a step jump between the 8-and 16-channel case for MAgNET.For the two largest cases (16 and 32 channels for MAgNET and 128 and 256 channels for CNN U-Net), the accuracy of both architectures is comparable.

Predictions of MAgNET for general (unstructured) meshes
In Section 3.3, we demonstrated that the MAgNET architectures can achieve very good predictive capabilities for structured mesh cases, which was also cross-validated against respective CNN U-Net architectures.In this section, we aim to show that the high prediction accuracy of MAgNET can also be expected for unstructured mesh cases, which is the central point of the results section.We consider two cases: the first one is deformation under the application of point loads (the 2D beam with hole case), similar to the case of structured examples, and the second is deformation under body forces (the 3D breast case).
Let us first analyze individual examples.In Fig. 16 we present two particular loading cases of the 2D beam with hole.One of them is loaded at the tip, featuring the highest nodal displacement magnitude of all test cases, and the other one is loaded close to the hole, representing high local distortions.Similarly, for the three-dimensional problem, in Fig. 17 we show the case featuring the highest nodal displacement magnitude of all test cases.In all mentioned examples we can observe overall good accuracy when visually comparing MAgNET predictions with the respective FEM solutions.This can also be checked quantitatively by analyzing maximum displacement errors.In the cases of the 2D beam with hole, those errors are 1.4% and below when related to the characteristic length of 1m.In the case of 3D breast geometry, such relative maximum error is higher, reaching almost 3.1% (related to the breast diameter of 0.16m).Despite this fact, we can observe that high local shape distortions are very well recovered.This property is more emphasized in Fig. 17c where one can additionally observe that also the Dirichlet boundary conditions are very well predicted, even though they were only introduced implicitly by training data.The aggregated error metrics for the entire test sets are provided in Table 6.The maximum displacement errors over all test cases, e max , are at the levels observed for particular cases in Figures 16 and 17.At the same time, the average mean errors, ē, are at least an order of magnitude lower, which suggests that the errors close to maximum levels are not that often.The average mean errors are further analyzed in a case-by-case manner in Fig. 18, which is analogous to the analysis done for the structured cases in Fig. 14.Again, we plot the mean error e, of each test example as a function of the maximum nodal displacement.Above, we have demonstrated a good prediction accuracy of MAgNET within the test dataset (which is located in the interpolated domain).However, it is well known that this accuracy can gradually deteriorate when moving to the extrapolated region, see, e.g., Deshpande et al. [2022].We are going to study this effect for MAgNET for a particular case that is based on the 3D breast geometry.As described in the Table 3, during the training, the b z component of body force density is varied from -3 to 3 N/kg only.At the inference time, we applied b z from -7 to 7 N/kg (keeping other components 0) to see how the predictions perform within and outside the training magnitudes.Figure 19a shows that the error is fairly low and is not increasing within the training region and it increases rapidly outside, which confirms this well-known effect.Figures 19b and 19c show deformed meshes predicted for b z = 5 and b z = 9 N/kg, respectively, both outside the training data region.MAgNET is observed to give visually acceptable results although the accuracy of the framework decreases as we move away from the training data.

A note on physics-informed errors
The proposed MAgNET framework has only been trained by minimizing the loss function for displacement errors, with no additional explicit information about the underlying physics/mechanics.As demonstrated earlier in this work, such training can provide very good accuracy in terms of predicted displacements.However, this accuracy is not of machine precision.To this end, a natural question arises: how far the displacement errors can violate physics?To answer that question, we are going to analyse some problem-based quantities of interest, such as residuals (balance of forces) or stresses, in comparison to the expected ground-truth results.
In Figure 20 we show nodal internal residual forces for the 2D beam with hole cases that we introduced earlier (compare Figure 16 for respective displacement errors).Ideally, the residual forces should be zero (the balance of forces), except for the boundary condition areas in which they should be exactly opposite to the reaction at the support and the applied external force.However, due to inaccuracies in displacements obtained from the MAgNET model, differences with respect to the ground true residuals can be noted.In Figure 20, we can observe the expected high residual forces in the areas where Dirichlet and Neumann boundary conditions are applied, however, also localised residual force spots are present in the fine mesh region around the hole.The magnitude of those errors in the localised spots can go up to 20% of the maximal magnitude of applied forces.Also, when more closely analysing the residuals at boundary condition areas, it turns out that they do not fully match the respective FEM residuals.For instance, the relative error in residuals at the support in Figure 20a Figure 20: Nodal residual forces obtained using MAgNET solutions for the examples in Figure 16 (plotted on deformed meshes).The relative error for retrieving the total reaction force at the fixed interface is (a) 4.7% for the first example (b) 0.1% for the second example.
The errors observed in Figures 16 and 20 can have a direct impact on some applicationdependent quantities of interest.As an example, in Figure 21, we present the field of von Mises stresses, which is a commonly used measure of shear stresses.We can observe that the MAgNET solution provides similar profiles of stresses as compared to respective FEM solutions, however, high localised errors are present at the fine mesh region (up to 30% of the reference FEM maximal von Mises stresses).
The above mentioned localised errors in residual forces, von Mises stresses and other relevant quantities of interest can be reduced by enriching the loss function with physicsinformed terms.For instance, in the context of mesh-based force-displacement data, in [Odot et al., 2021]

Conclusion
In this work we proposed MAgNET, a novel framework for efficient supervised learning on graph-structured data using geometric deep learning.The framework comprises two neural network operations: MAg and graph pooling/unpooling layers, which together form a graph U-Net architecture capable of learning on large-dimension inputs/outputs.Notably, the MAgNET framework is not restricted to any particular input→output relationship or any specific mesh-or discretization scheme, making it superior to existing convolutional neural network architectures.MAgNET allows for arbitrary non-grid inputs/outputs, meaning it can handle arbitrary meshes and support complex geometries and local mesh refinements, making it suitable for a wide range of engineering applications.
We demonstrated and studied the capabilities of MAgNET in capturing nonlinear relationships in data.For this purpose, we conducted quantitative cross-validation of predictions made by MAgNET and the well-known convolutional U-Net architecture, both of which have been verified against the ground-truth results obtained with FEM.The benchmarks have proved that MAgNET has similar predictive capabilities as CNN U-Net for structured meshes, and it can also be extended to arbitrary meshes while preserving similar accuracy of predictions.
There are several natural directions for extending the capabilities of MAgNET.Firstly, the inclusion of underlying physics into the training process would enhance the overall performance of the framework.Although we have numerically demonstrated that the current data-based MAgNET framework can capture the underlying physics of the problem, we have also identified areas of increased errors, especially near regions of refined mesh.Integrating quantities of interest with the learning objective function would improve performance, and the implementation of a Physics Informed MAgNET is a very natural extension of the framework in the immediate future.Secondly, extending MAgNET to path-dependent processes, such as elasto-plasticity, is a promising direction, which would require keeping track of the evolution of state variables in a time-stepping manner.Recurrent neural network architecture, as described in [Mozaffar et al., 2019] or recent developments from our group [Vijayaraghavan et al., 2021], may be utilized for this purpose.Finally, there is a direct possibility to extend the proposed MAg layer to a Bayesian version, which would convert MAgNET into a probabilistic version.This could be achieved by performing local aggregations with probability distributions instead of discrete weights, similar to what we did in the case of CNNs in our previous work [Deshpande et al., 2022].A Bayesian MAgNET would be capable of tracking uncertainties that are inherent to the choice of network architecture, as well as those inherent to real-world data.
We have made all the codes, datasets, and examples presented in this paper available openaccess in the MAgNET repository at https://github.com/saurabhdeshpande93/MAgNET.Given the generality of MAgNET in supporting arbitrary non-linear relationships and arbitrary discretizations, we believe that the repository will provide a useful surrogate modeling framework for researchers and practitioners in various application areas across disciplines.We see it not only as a ready-to-use machine-learning library but also as a reference point and foundation for future developments and extensions in this emerging direction of research.The generality of MAgNET will enable the community to explore a range of new applications and modeling scenarios.By sharing our work, we hope to foster collaboration and advance the state-of-the-art in deep-learning surrogate modeling.
764644.Jakub Lengiewicz would like to acknowledge the support from EU Horizon 2020 Marie Sklodowska Curie Individual Fellowship MOrPhEM under Grant 800150.This paper only contains the author's views and the Research Executive Agency and the Commission are not responsible for any use that may be made of the information it contains.
Stéphane Bordas and Jakub Lengiewicz are grateful for the support of the Fonds National de la Recherche Luxembourg FNR grant QuaC C20/MS/14782078.Stéphane Bordas received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 811099 TWINNING Project DRIVEN for the University of Luxembourg.

Figure 2 :
Figure 2: A schematic of Graph U-Net architecture for mesh based inputs.Colors indicate different types of layers.c1, c2, . . ., c5 stand for channel dimensions.Different arrows indicate different layers: the graph Multi-channel Aggregation (MAg) layer, the graph pooling/unpooling layers, and the concatenation layer.

Figure 3 :
Figure 3: Adjacency matrices for the (a) square and (b) triangular meshes.The dashed lines in (a) represent additional edges that are added to the original mesh.

Figure 4 :
Figure 4: Local aggregation in MAg (a) works very similar to the filter application in CNN (b).However as opposed to CNN, MAg uses different set of weights at different spatial locations with heterogeneous window size.In CNN, a constant filter slides across the channel.

Figure 5 :
Figure 5: One arbitrary choice of non-overlapping subgraphs to create a pooled graph.Subgraphs G1, . . ., G5 are represented with different colors and are generated by the Algorithm 1.

Figure 6 :
Figure 6: This 2D mesh requires at least 4 subsequent local aggregation operations (orange areas with center nodes marked by dots) to propagate the feature information from node B to the distant node C.

Figure 7 :
Figure 7: Visualisation of feature information exchange between nodes in the pooled graph.In the pooled space, only 2 MAg operations are sufficient to exchange feature information between spatially further located nodes in the original graph.The orange region shows the window of MAg operation.

Figure 9 :
Figure 9: MAgNET architecture used for the 2D L-shape example.

Figure 10 :
Figure10: Zero-padding is applied to make the L-shape topology compatible with the CNN framework.The additional nodal values for inputs (forces) and outputs (displacements) are set as zero vectors.

Figure 11 :
Figure 11: Training loss curve for the 3D breast MAgNET model.

Figure 12 :
Figure12: Deformation of 2D L-shape under point load (-0.93, 0.91)N on the corner node (a) Deformed mesh predicted using MAgNET (blue), for comparison FEM solution is presented (red) (b) L2 error of nodal displacements between MAgNET and FEM solution.The relative error for the corner node displacement using MAgNET is 0.5% (c) L2 error of nodal displacements between CNN U-Net and FEM solution.The relative error for the corner node displacement using CNN is 0.3%.

Figure 13 :
Figure13: Deformation of the 3D beam under point load (-1.75,1.31,-1.7)Non the second last node (a) Deformed mesh predicted using MAgNET (blue), for comparison FEM solution is presented (red) and undeformed mesh is represented by gray (b) L2 error of nodal displacements between MAgNET and FEM solution.The relative error in predicting displacement of the node of application of load using MAgNET is 4.4% (c) L2 error of nodal displacements between CNN U-Net and FEM solution.The relative error in predicting displacement of the node of application of load using CNN is 3.0%.

Figure 14 :
Figure 14: Mean absolute errors (see Equation (18)) as a function of maximum nodal displacements for all test examples for 2D L-shape and 3D beam cases for CNN U-NET and MAgNET nerworks.

Figure 15 :
Figure 15: Average mean error over the test set for the L-shape case as the number of network parameters is changed by altering the number of channels used for MAgNET and CNN U-Net architectures.The numbers in the plot represent the number of channels used at each level in the MAgNET and CNN U-Net networks.

Figure 17 :Figure 18 :
Figure 17: Deformation of the 3D breast geometry with force density of (−5.94, −5.23, −2.56) N/kg.(a) Deformed meshes computed using MAgNET (blue) and FEM (red), with the undeformed configuration (gray).(b) L2 error of nodal displacements between MAgNET and FEM solutions.(c) Titled view of the figure(b), MAgNET efficiently captures fixed boundary and nearby high non-linear deformations by learning implicitly from the data.

Figure 19 :
Figure 19: 3D Breast deformation under horizontal body force densities, (0, 0, bz) N/kg.(a) Mean absolute error for testing cases in interpolated and extrapolated regions.The error increases rapidly in the extrapolated region while it remains low in the training (interpolated) region.(b)&(c) Visualisation of deformed meshes for force densities outside the training region computed using MAgNET (blue) and FEM solution (red).

Figure 21 :
Figure 21: Von Mises stresses obtained for the two examples as in Figure 16 using (a)&(b) MAgNET solution (c)&(d) FEM solution.In (e)&(f) the absolute error between the MAgNET and FEM von Mises stresses is shown.

novel graph U-Net framework
Figure 1: A novel graph U-Net neural network surrogate model for mesh-based simulations.MAgNET accurately captures non-linear FEM responses.

Table 2 :
Material properties used for the benchmark cases.

Table 3 :
Specification of FE-based datasets.For cases (a-c), the external force is applied at a selected node, and for case (d), external body forces are applied.The magnitudes of forces are randomly sampled from the multivariate uniform distribution, with ranges specified in the table.For cases (a-c), multiple samples per node are generated, for all nodes in the prescribed area of interest.

Table 4 :
Neural network architectures implemented in this work.The leaky ReLU activation function is used in all MAgNET cases, while ReLU activation is used for CNN cases.For the last layers, the linear activation function is always applied.

Table 5 :
Error metrics for the structured mesh examples.Mte stands for the number of test examples, and ē, σ(e), emax are error metrics defined by Equations (20)-(22).

Table 6 :
Error metrics for the unstructured mesh examples.Mte stands for the number of test examples, and ē, σ(e), emax are error metrics defined in Section 3.3.
is almost 5%.When analysing the entire test set, we observed that the mean error in retrieving residuals at the Dirchilet boundary related to the maximum external force is 2.4%.A similar relative mean error value for retrieving the Neumann boundary residual is 14.6%.The higher error for Neumann boundary residual is attributed to high local non-linear deformations at the vicinity of the point of application of force.Though the displacement errors provoked by these non-linearities are not so high, they can result in high residual errors through the relatively high magnitude of element stiffness.
[As'ad et al., 2022]been introduced by scaling individual components of the loss function with the respective computed residual values, which proved to reduce residual errors.In[As'ad et al., 2022], the authors introduced an energy-based approach that provided purely physics-informed training for the Gauss-point stress-strain relationship, which allowed them to satisfy the expected frame indifference.Similar concepts of physics-informed loss functions can be seamlessly integrated into the MAgNET framework, which would convert it into a Physics Informed MAgNET.