Agglomeration of Polygonal Grids using Graph Neural Networks with applications to Multigrid solvers

Agglomeration-based strategies are important both within adaptive refinement algorithms and to construct scalable multilevel algebraic solvers. In order to automatically perform agglomeration of polygonal grids, we propose the use of Machine Learning (ML) strategies, that can naturally exploit geometrical information about the mesh in order to preserve the grid quality, enhancing performance of numerical methods and reducing the overall computational cost. In particular, we employ the k-means clustering algorithm and Graph Neural Networks (GNNs) to partition the connectivity graph of a computational mesh. Moreover, GNNs have high online inference speed and the advantage to process naturally and simultaneously both the graph structure of mesh and the geometrical information, such as the areas of the elements or their barycentric coordinates. These techniques are compared with METIS, a standard algorithm for graph partitioning, which is meant to process only the graph information of the mesh. We demonstrate that performance in terms of quality metrics is enhanced for ML strategies. Such models also show a good degree of generalization when applied to more complex geometries, such as brain MRI scans, and the capability of preserving the quality of the grid. The effectiveness of these strategies is demonstrated also when applied to MultiGrid (MG) solvers in a Polygonal Discontinuous Galerkin (PolyDG) framework. In the considered experiments, GNNs show overall the best performance in terms of inference speed, accuracy and flexibility of the approach.


Introduction
Many applications in the fields of Engineering and Applied Sciences, such as fluid-structure interaction problems, flow in fractured porous media, crack and wave propagation problems, are characterized by a strong complexity of the physical domain, possibly involving moving geometries, heterogeneous media, immersed interfaces and complex topographies. Whenever classical Finite Element Methods (FEMs) are employed to discretize the underlying differential model, the process of grid generation can be the bottleneck of the whole simulation, as computational meshes can be composed only of tetrahedral, hexahedral, or prismatic elements. To overcome this limitation, in the last years there has been a great interest in developing FEMs that can employ general polygons and polyhedra as grid elements for the numerical discretizations of partial differential equations. We mention the mimetic finite difference method [1,2,3,4], the hybridizable discontinuous Galerkin method [5,6,7,8], the Polyhedral Discontinuous Galerkin (PolyDG) method [9,10,11,12,13,14,15], the Virtual Element Method (VEM) [16,17,18,19,20,21] and the Hybrid High-Order method [22,23,24,25,26]. This calls for the need to develop effective algorithms to handle polygonal and polyhedral (polytopal, for short) grids and to assess their quality (see e.g. [27]). For a comprehensive overview we refer to the monographs and special issues [4,14,25,28,20,21] and the references therein. Among the open problems, there is the issue of efficiently handling polytopal mesh agglomeration, i.e., merging mesh elements to obtain coarser grids [29,30,10,31,32]. Mesh agglomeration can be used to obtain a coarser discretization of the differential problem at hand, in order to reduce the number of degrees of freedom where not needed and therefore also the computational effort. This operations can be naturally performed in the context of polygonal and polyhedral grids, because of the flexibility in the definition of the shape of mesh elements. This approach has multiple applications in the numerical solution of partial differential equations, for example: • it can be used, with adaptive procedures, to reduce the number of degrees of freedom where not needed because in certain parts of the domain the error is already under control; • it can be used to generate a hierarchy of (nested) coarser grids starting from a fine mesh of a complex physical domain of interest, in order to employ them in multigrid solvers [10,33,34,35,36,37,38] to accelerate the converge of iterative algebraic; • it can be employed together with domain decomposition techniques [39,40,41,42] to obtain a meaningful decomposition of the domain, starting from a fine discretization.
Grid agglomeration is a topic quite unexplored, because it is not possible to develop such kind of strategies within the framework of classical FEMs. During this operation it is important to preserve the quality of the underlying mesh, since this might affect the overall performance of the method in terms of stability and accuracy. Indeed a suitable adapted mesh may allow to achieve the same accuracy with a much smaller number of degrees of freedom when solving the numerical problem, hence saving memory and computational power. However, since in such a general framework mesh elements may have any shape, there are no well established strategies to achieve effective agglomeration with an automatic, fast and simple approach.
In recent years there has been a great development of Machine Learning (ML) algorithms, a framework which allows to extract information automatically from data, to enhance and accelerate numerical methods for scientific computing [43,44,45,46,47,48,49,50,51,52]. In this work, we propose to use ML-based strategies to efficiently handle polygonal mesh agglomeration, in order to fully exploit all of the benefits of the above mentioned numerical methods, such as geometrical flexibility and convergence properties. The core concept lies in learning the "shape" of mesh elements in order to perform the desired operations accordingly. Such a learning needs to be performed in an automatic and flexible way, because of the too high variability of geometries of interest, tailoring the approach to a wide range of different possible situations. Rather than simply trying to decide a priori criteria to perform agglomeration, which would inevitably result in poor performance or high computational cost due to the impossibility of capturing all of the possible situations, ML strategies exploit and process automatically the huge amount of available data to learn only the distribution of the features of interest for the application, leading to high performance and computational efficiency. By combining the a priori approach of classical numerical methods, with the a posteriori approach of ML-strategies, it is possible to not only to boost existing algorithms but also to develop new algorithms capable to work in more general frameworks. In particular, we propose the use of Graph Neural Networks (GNNs) [53,54,55,56,57], that are deep learning architectures specifically meant to work with graph-structured data.
The problem of mesh agglomeration can be re-framed as a graph partitioning problem, by exploiting the connectivity structure of the mesh. In particular, the graph representation of the mesh is obtained by assigning a node to each element of the mesh, and connecting with an edge the pair of nodes which are relative to adjacent elements in the original mesh. By exploiting such a representation, GNNs can be applied to solve a node classification problem, where each element is assigned to a cluster, which corresponds an element of the agglomerated mesh. GNNs can process naturally and simultaneously both the graph structure of mesh and the geometrical information that can be attached to the nodes, such the elements areas or their barycentric coordinates. This is not the case of other approaches such as METIS [58], a standard solver for graph partitioning which can process only the graph information, or k-means, which can process only the geometrical information. The proposed GNN-based algorithms exploit an unsupervised training procedure, where parameters are set to minimize the expected value of the normalized cut, over a database of polygonal grids. To investigate the capabilities of the proposed approaches, we consider a second-order model problem discretized by the PolyDG method in a multigrid framework. We measure effectiveness through an analysis of quality metrics and number of iterations of the the algebraic iterative solver. We also consider the generalization capabilities over a human brain MRI scan section.
The paper is organized as follows. In Section 2 we present possible agglomeration criteria for polytopes. In Section 3 we propose a general framework to perform mesh agglomeration using GNNs. In Section 4 we measure the effectiveness of the proposed agglomeration strategies in terms of computational cost, quality metrics and generalization capabilities over unseen complex physical domains. In Section 5 we present some computations obtained by applying the considered agglomeration strategies to multigrid solvers in a PolyDG framework. In Section 6 we draw some conclusions.

Mesh agglomeration strategies
We recall that the problem of mesh agglomeration can be re-framed as a graph partitioning problem, by exploiting the connectivity structure of the mesh.
In particular, the graph representation of the mesh is obtained by assigning a node to each element of the mesh, and connecting with an edge the pair of nodes which are relative to adjacent elements in the original mesh, i.e. polygons that share at least one edge. Moreover, features can be assigned to each node, storing geometrical information such as the area of the element or its barycentric coordinates. Finally, partitioning the nodes of the mesh into proper clusters using a suitable algorithm allows to obtain an agglomerated representation of the original mesh.

Graph Partitioning
Let N > 0 be the number of graph nodes. Given a graph G = (V, E), where  Figure 1. We restrict our framework to undirected graphs, i.e. e(v i , v j ) ∈ E ⇐⇒ e(v j , v i ) ∈ E. We will use the following notation: • A ∈ R N ×N is the adjacency matrix, i.e. A i,j = 1 if e(v i , v j ) ∈ E and 0 otherwise; • X ∈ R N ×F is the features matrix of the nodes, which may represent any information such as coordinates, where F is the number of features; • N (v i ) = {v j ∈ V : e(v i , v j ) ∈ E} denotes the neighborhood of the i-th node, containing the nodes v j directly adjacent to v i .
We can define the cut of a graph, representing the number of edges connecting the disjoint sets of nodes resulting from the paritioned graph. In the case of two partitions, it can be defined as: and can be easily generalized to the case of M partitions, as The normalized cut is defined as where the volume of the partition S k is defined as It represents the total degree of all nodes of the k-th partition. Depending on the application of interest, different notions of volume can be used. The normalized cut, contrary to the standard cut, allows to take into account partitions where the number of nodes is balanced between the different sets. Let Y ij be the probability for node i of belonging to partition j. The expected cut, given two partitions S k and its complement S C k , is defined as Let D be the column vector of the degrees of the nodes, i.e.
The expected normalized cut can be defined as where and denote the element-wise division and multiplication, respectively, and the summation is over all the entries of the resulting matrix.

METIS
A standard algorithm for partitioning large graphs or meshes and computing fill-reducing orderings of sparse matrices is METIS [58]. As depicted in Figure 2, METIS graph partitioning algorithm is based on a multi-level graph bisection procedure, consisting of the following main steps: 1. Graph coarsening phase: from the input graph successively smaller graphs are derived by collapsing adjacent pairs of vertices, until the size of the graph is sufficiently small. 2. Initial partitioning phase: a partition of the coarsest graph is computed my minimizing the edge cut.
3. Uncoarsening phase: the partitioning of the smallest graph is projected back to the successively larger graphs, by assigning the pairs of vertices that were collapsed together to the same partition.
4. Refinement phase: after each uncoarsening step the partition is refined, adjusting nodes close to the interface of the two sets.
Once the graph has been divided into two sets, METIS applies the same algorithm recursively on the new sub-graphs, until the desired number of sets is reached.

Machine learning-based graph partitioning
We propose to employ ML-based bisection models of the form M(G, X) = Y , that take as input a graph G together with a set of features X attached to each node, such as the barycentric coordinates or the area of the mesh elements, and output the vector of probabilities Y of each node belonging to cluster 1 or cluster 2. In order to apply such models for mesh agglomeration, we can use Algorithm 1, that recursively bisect the connectivity graph of the input mesh until the agglomerated elements have the desired size. We recall that the diameter of a domain D is defined, as usual, as diam(D) := sup{|x − y|, x, y ∈ D}. Given a polygonal mesh, i.e. a set of non-overlapping polygonal regions that covers a domain Ω, we can define the mesh size h = max i=1:N P diam(P i ). Algorithm 1 automatically generates a hierarchy of nested grids with different sizes, to be employed e.g. within multigrid solvers.
If the model M does not return a valid partition Y , e.g., because within a set

Algorithm 1 General mesh agglomeration strategy
Input: mesh T h , target mesh size h * , bisection model M.
Extract the connectivity graph G and features X from T h

6:
Refine partition Y , by minimizing the length of the boundary of the agglomerated polygons.

7:
Partition T h into sub-meshes T to, the k-means clustering algorithm and GNNs, as we will see in the following.

The k-means clustering algorithm
The k-means clustering algorithm [59,60,61], is an iterative procedure for partitioning a set of input points in R n , where n ≥ 1 is the number of features, into k parts, as described in Algorithm 2. It is an unsupervised learning algorithm, as no labels are provided together with the data. It relies on the definition of k centroids and each point is assigned to the closest centroid. The location of the centroids is chosen in such a way to minimize the euclidean distance between centroids and points within the clusters. In order to apply 1: Randomly sample k points and label them centroids c 1 , c 2 , ..., c k .
by assigning each point to the cluster with the closest centroid in L 2 norm. the k-means algorithm for graph bisection within Algorithm 1, we simply use Algorithm 2 employing only two clusters and using as features the barycentric coordinates of the mesh elements. This strategy tends to produces "rounded" (star-shaped) agglomerated elements of similar size. Notice that no information on the connectivity graph is taken into account directly. However, for a regular mesh with elements of similar size, barycentric coordinates are usually strongly correlated with elements being adjacent.

Graph neural networks-based agglomeration strategies
In order to perform grid agglomeration by exploiting effectively both the graph representation and the geometrical features of meshes, we employ GNNs as bisection model in Algorithm 1. Examples of GNNs-based models directly applied to the original grid are [54,55], which employ additional processing based on the graph spectrum to perform a preliminary embedding step, in order to extract features that can later be fed to a GNN-based partitioning module.
In our case, the graph extraction of geometrical features such the elements area or their barycentric coordinates can be leveraged to perform a classification task, therefore avoiding the need for an additional spectral embedding module.
The graph-bisection model performs a classification task, by taking as input the graph G = (V, E) and the features related to its nodes X ∈ R N ×F , and outputting a probability tensor Y ∈ R N ×2 , where Y ij represents the probability that the node v i ∈ V belongs to the partition S j , with j = 1, 2. This approach can be obviously generalized to an arbitrary number of partitions. However, this would require a specific GNN model for each fixed number of classes, while the 2-classes case can be easily extended to a multi-class case by recursively calling the bisection model on the graphs of each partition. The general framework for mesh agglomeration via GNNs is shown in Figure 3.

Unsupervised learning for graph partitioning
In a unsupervised learning framework, we are given an unlabelled dataset, where N G ≥ 1 is the number of graphs in the database, for which we want to find a different representation. For the case of mesh agglomeration, these graphs represent the elements connectivity of different computational grids. We consider then a node classifier F , which in our case will be a GNN, parameterized by a set of weights W , that takes as input the adjacency matrix A and the nodes features X of a graph and outputs the probability Y ij for node i of belonging to partition j. Our goal is to tune W so that F minimizes the following loss function where is the expected normalized cut of the graph.

Graph neural networks
GNNs are deep learning architectures specifically meant to work with graphstructured data within the framework of Geometrical Deep Learning, that concerns the application of neural networks to non-Euclidean data structures. Mapping, or layers, that can be combined to construct the GNN architecture are the following.  step k, the hidden representation H k i for the node v i is computed as: where H 0 = X for the initial layer, Φ is the aggregation function that defines how the information coming from the neighborhood N (v i ) of node v i is aggregated, and Ψ is the combination function that defines how the aggregated information a k i is combined with the one stored in v i . Different definitions of aggregation and combination functions leads to different GNN architectures. In particular, we re-frame equations (10) and (11) by considering the mean aggregation function, as follows: where σ(·) is a non-linear activation function, such as the REctified Linear Unit (ReLU) or the hyperbolic tangent (tanh), and W k 1 , W k 2 ∈ R F k ×F k+1 are weight matrices associated to the l-th layer, representing a trainable linear transformation, where F k and F k+1 are the features dimensions for the current and next layers, respectively. We refer to (12) as SAmpling-and-aggreGatE Convolutional (SAGEConv) layer [53] followed by activation function σ. Such layers also account for the possibility of sub-sampling the neighbourhood of a node during the aggregation process, leveraging the information coming from features to better generalize to unseen nodes, while keeping the computational cost under control.
while for other features, such as barycentric coordinates, we first center them, by subtracting the mean, and then rescale to [−1, 1] They are used to assign a probability to each class.

Graph neural network training
The input features matrix is X ∈ R N ×3 , where the first columns contains the area of each mesh element followed by the two coordinates of its barycenter. The GNN architecture we employed first applies a INorm layer, then four  Training was performed for 300 epochs in approximately 35 minutes on Google Colab cloud platform, using a 2.20GHz Intel Xeon processor, with 12GB of RAM memory and NVIDIA Tesla T4 GPU with 16GB of GDDR6 memory.

Validation on a set of polyhedral grids
In this section we compare the performance of the proposed algorithms. To evaluate the quality of the refined grids, we employ the following quality metrics The higher its value is the more rounded mesh elements are.
We consider four different grids of domain (0, 1) 2 : a grid of regular triangles, a grid of triangles with random location of the vertices, a Voronoi grid with random location of the seeds, and a grid of regular squares. In Figure 6 these grids have been agglomerated using METIS, k-means and GNNs strategies.
For METIS the target number of mesh elements is N 0 /16, where N 0 is the initial number of elements, while for k-means and GNN the target mesh size in Algorithm 1 is h 0 /4, where h 0 is the initial mesh size. As we can see, the k-means and the GNN algorithms are capable to recover a regular grid of squares when starting from regular meshes (triangles and squares), while this is not the case  for METIS. In Figure 7 we show the box plots of the computed quality metrics for the selected grids. In general, quality metrics are lower for the METIS algorithm, while k-means and GNNs have comparable performance. This is further confirmed by Tables 1 and 2, where we report the average values of UF and CR. In particular, the performance difference is much more evident for regular grids. In Tables 3 and 4 we also report the relative performance with respect to METIS, i.e., the ratio between the average UF and CR metrics of      Figure 6.
k-means and the GNN algorithms), seem to preserve the initial geometry and quality of the grids, because the geometric information attached to the nodes is taken into account, making them suitable for adaptive mesh coarsening. This is not the case for the METIS algorithm, because it processes only the information coming from the graph topology of the mesh.

Application to a computational mesh stemming from a human brain MRIscan
In order to further test the generalization capabilities of our models, we apply them on a much more complex domain with respect to the meshes considered so far. In particular, we consider the mesh of a section of a human brain coming from an MRI-scan, consisting of 14372 triangular elements. The domain is highly non-convex and presents many constrictions and narrowed sections. We agglomerated such a grid using METIS, k-means and GNN algorithms: the result is reported in Figure 8. For k-means and GNN the target mesh size in Algorithm 1 is 0.2D, where D is the maximum distance between any two vertices of the initial mesh. For METIS the target number of mesh elements is 50, which corresponds approximately to the same mesh size. In Figure 9 we show also the corresponding box plots of the quality metrics and in Table 5 we report the average values. As we can see, performance are comparable in terms of UF, while the ML-based strategies achieve on average a higher "roundness", measured in terms of CR. This indicates good generalization capabilities of    k-means, GNN), applied to the section of the MRI brain scan, for the agglomerated MRI brain scan meshes reported in Figure 8. the GNN algorithm, as the considered mesh was very different from the ones included in the training set, both in terms of shape, dimensions and number of elements. The k-means reasonably performs slightly better than the GNN, because it does not require prior information coming from a database at the cost of having a higher online computational cost, as we will see in the next section.

Runtime performance
The main advantage in using Neural Network-based solutions is the low computational cost for online inference, with respect to k-means or METIS. We measured the computational cost of the different graph partitioning algorithms on 21 random Voronoi meshes with an increasing number of elements from 25 to 5000. Since the runtimes are not deterministic, we performed a sampling of 20 runtimes for each mesh. Results are shown in Figure 10. As we can see, the GNN algorithm outperforms the METIS and the k-means ones. For a mesh with 5000 elements we have that GNN is 4.4836 times faster than Metis and 3.5670 times faster than k-means. Such a performance improvement is expected to grow asymptotically, due to the fact that the computational complexity of the Neural Network mainly scales with the dimension of the data manifold [63], can be particularly beneficial when using multigrid scheme as preconditioners.

Applications to agglomeration-based multigrid methods
In this section we test the effectiveness of the proposed agglomeration strategies, to be used in combination with polygonal finite element discretizations within a MultiGrid (MG) framework [10,33,34,35,36,37,38]. We consider the following model problem: with Ω and f ∈ L 2 (Ω) the forcing term, selected in such a way that the exact solution is given by u(x, y) = sin(πx) sin(πy). We consider the V-cycle MG algorithm with additive Schwarz smoothing within a PolyDG discretization framework, as described in [38]. We employ the initial grids shown in the first column of Figure 6 (triangles, random, Voronoi, squares), agglomerated using the proposed strategies (METIS, k-means, GNNs). In particular, for each initial mesh  Figure 6). As performance metric, we consider the Iteration count of the MG algorithm to reduce the (relative) residual below 10 −6 in solving the algebraic formulation of problem (13). We vary the following parameters: number of levels employed , polynomial degree p, number of smoothing steps m. In Table 6 we report the iteration count when varying the number of levels = 2, 3, 4 with m = 3 and p = 1.
We can observe the following: • The iteration count of the MG methods are significantly lower than the ones the of the classical Conjugate Gradient (CG) method, meaning the considered MG implementation is indeed effective.
• The iteration count of the k-means and GNN algorithms are lower than the ones of METIS, meaning the higher grid quality provided by the MLbased agglomeration strategies can help accelerating the convergence of the numerical method, with GNN having the best performance.
• The iteration count required when varying the number of levels is constant, meaning it is independent of the granularity of the underlying grid and therefore scalable in terms of mesh size.
In Table 7 we also consider the case for m = 1. Despite using such a low value, the MG method still converges in all cases. As expected, the iteration count increases but the iteration count of the ML strategies is still lower with respect to the ones of the CG, while this is not the case for the METIS algorithm.
In Table 8    steps m = 3, 5 with = 3 and p = 1. As we can see, when the number of steps increases the iteration count decreases, as well as the performance difference between the different methods. In Table 9 we report the iteration count when varying the polynomial dgree p = 1, 2, 3 with = 3 and m = 3. As expected, since m is fixed, when p increases also the iteration count increases, because more degrees of freedom are involved in the formulation of the problem [38].
In general, the considered experiments highlights that when employing MLbased agglomeration strategies a lower iteration count is required by the MG solver with respect to METIS, thanks to the higher quality of the produced grids, with GNN having the best performance.

Conclusions
We  in an automatic and cost effective manner in order to tailor the approach a wide range of possible situations, which would not be possible using classical strategies. The novelty of the proposed method consists in improving classical methods for mesh agglomeration using GNN architectures and the k-means clustering algorithm. Advantages include computational efficiency, handling a wider range of data variability, independence from the differential model and the numerical method at hand, flexibility in exploiting additional geometrical information and higher grid quality.
We first re-framed the problem of mesh agglomeration as a graph partitioning problem, by exploiting the graph representation of the connectivity structure of mesh elements. We then developed in this context a framework to employ MLbased strategies, such as k-means and GNNs which can exploit the geometrical information of the grid. We trained GNNs to perform graph partitioning over a suitably constructed database of meshes. We compared the performance of the  Table 9: Iteration count of the MG algorithm to reduce the (relative) residual below 10 −6 employing different initial grids (triangles, random, Voronoi, squares) agglomerated with different strategies (METIS, k-means, GNN) with = 3, p = 1, 2, 3, m = 3. As a comparison, the iteration count of the Conjugate Gradient (CG) method are reported in the last column.
over a set of different grids, featuring also complex real domains such a brain MRI-scan. Results show that ML-based strategies are more robust and can better preserve mesh quality, making them suitable for adaptive mesh coarsening. We also employed the proposed algorithms in the context MG methods in PolyDG framework, showing a lower iteration count for ML-based strategies, with GNN having the best performance. Indeed, GNNs have the advantage to process naturally and simultaneously both the graph structure of mesh and the geometrical information, such the elements areas or their barycentric coordinates. This is not the case of METIS [58], which is meant to process only the graph information, or k-means, which can process only the geometrical information. Moreover, GNNs have a significantly lower online computational cost.
Being able to compute the solution in fast manner, even with a slight loss of accuracy, can be particularly beneficial when using multigrid scheme as preconditioners.
Future developments certainly include fostering the generalization capabilities of the GNNs, by generating a larger database of grids or by considering different geometrical features, such as quality metrics or error estimators. Another possibility is to reframe the approach in a reinforcement learning framework, as proposed by [55]. The extension to three dimensional agglomeration strategies should certainly be explored, where the low online computational cost of GNNs will play a key role. It is worth noticing that all of the proposed agglomeration strategies can be applied to the three dimensional case with little or no modification, as they mainly rely on the connectivity graph structure of the mesh, which is a dimensionless entity, or on geometrical quantities with natural extensions such the area/volume of polytopes of their barycentric coordinates.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.