A Machine Learning approach to enhance the SUPG stabilization method for advection-dominated differential problems

We propose using machine learning and artificial neural networks (ANNs) to enhance residual-based stabilization methods for advection-dominated differential problems. Specifically, in the context of the finite element method, we consider the streamline upwind Petrov-Galerkin (SUPG) stabilization method and we employ ANNs to optimally choose the stabilization parameter on which the method relies. We generate our dataset by solving optimization problems to find the optimal stabilization parameters that minimize the distances among the numerical and the exact solutions for different data of differential problem and the numerical settings of the finite element method, e.g., mesh size and polynomial degree. The dataset generated is used to train the ANN, and we used the latter ``online'' to predict the optimal stabilization parameter to be used in the SUPG method for any given numerical setting and problem data. We show, by means of 1D and 2D numerical tests for the advection-dominated differential problem, that our ANN approach yields more accurate solution than using the conventional stabilization parameter for the SUPG method.


Introduction
The Galerkin-finite element (FE) method applied to partial differential equations (PDEs) with advection terms dominating over diffusion ones may suffer of numerical instability [22,34]. These numerical instabilities cause the numerical solution to exhibit oscillations that increase in amplitude with a local increment of the transport dominance over diffusion, i.e., as soon as the local Péclet number becomes larger than one. In order to eliminate (or at least mitigate) numerical instabilities, a general employed strategy is to consider the generalized Galerkin method, i.e., the Galerkin method with additional stabilization terms [34]. Examples of stabilization methods to reduce numerical oscillations in the advection dominated regimes are for instance the upwind and streamline-diffusion method, both methods that are not strongly consistent. Instead, examples of strongly consistent methods are the Galerkin-least-squares, the streamline upwind Petrov-Galerkin (SUPG), and the Douglas-Wang methods [41,6,34]. In particular, for these strongly consistent stabilization methods, the formulation depends on the residual of the PDE (in strong formulation), other than on a parameter -called stabilization parameter -whose definition is crucial for the success of the stabilization strategies. Determining the values of such stabilization parameter is not straightforward, especially for 2D and 3D problems. In addition, a universal formulation of such parameter is lacking, especially as it is strongly dependent on data of the model and the numerical setting used for the FE approximation of the PDEs, e.g., low and high order FE methods, spectral element methods, isogeometric methods, etc. [42,7,12]. Some of the formulations for the stabilization parameter have been derived from analytical considerations on the differential problem in 1D, others are suitable only for a specific numerical approximation method, while the most comprehensive ones incorporate some (empirical) dependence on numerical setting, as e.g., the order of the FE [5,6,8,9,10,16,18,38,44,43,45].
In this work, we propose using machine learning (ML) [30,49,19] to make computers learn autonomously the values of the stabilization parameter for the FE approximation of advection dominated PDEs. Artificial neural networks (ANNs) [46,27] are widely popular in ML and Deep Learning for a wide array of applications: they are indeed very versatile tools that are increasingly finding their way in scientific computing [31,29], especially in the context of numerical approximation of PDEs. For instance, as substitute to standard numerical methods, ANNs can be employed as meshless methods in physics informed neural networks (PINNs) to directly approximate the solution of the PDE as it is trained by minimizing the (strong) residual of the PDE [36,35,37]. ANNs are also largely employed in a data-driven fashion in the context of model order reduction for parametric PDEs [17,40,21,20,50] and to enhance the stability properties of numerical methods for PDEs [13]. In fluid dynamics modelling, ANNs are massively adopted for flow features extractions, modelling, optimization and control. For flow features extractions, ANNs are used through clustering and classification to classify wake topologies [11]; for modeling fluid dynamics by reconstructing specific flows such as the near wall field in a turbulent flow [28] and for flow optimization [32], and control for aerodynamics applications [4]. ANNs are also used in a data-driven framework as a manner for providing alternative closure models for stress tensor [14] in Reynolds-Average Navier-Stokes (RANS) equations, for sub-grid scale models in large eddy simulation (LES) turbulence models [23,24,47,51,48], or for model learning input-output relationships in complex physical processes [39].
In this work, we use ANNs to learn the optimal stabilization parameter in advection dominated PDEs that are discretized by means of the FE method: the goal is to enhance the accuracy of the SUPG FE method by optimally selecting the stabilization parameter under different data of the PDE and numerical settings of the FE method. We found that the proposed ANN-enhanced stabilization method allows to improve accuracy and stabilization properties of the numerical solution compared to those results obtained by analytical expressions of the stabilization parameter. The numerical results obtained shed light also on the possibility to apply the presented strategy to learn closure laws for stabilization and turbulence models of fluid dynamics, as for instance to learn the stabilization parameters in the Variational Multiscale-LES model to model transitional and turbulent flows [15,3,53].
This work is organized as follows: in Section 2, we recall the SUPG stabilization method for advection-diffusion equations; in Section 3, we present our numerical strategy to compute an optimal SUPG stabilization parameter through a feed-forward fully connected ANN. In Section 4 we validate our method by comparing the ANN results with those obtained with the 1D advection-diffusion problem from which the expression of the theoretical stabilization parameter has been derived. In Section 5 we first show the ANN's training and we present our numerical results on the 2D advection-diffusion problem used for training and we finally generalize our findings using the ANN's prediction on a different advection-diffusion problem. Finally, in Section 7 we draw our conclusions highlighting possible future developments.

The SUPG method for advection-diffusion problems
We briefly recall the advection-diffusion differential problem and the SUPG stabilization method for the advection dominated regime.
Let Ω ∈ R d , d = 1, 2, 3 be the physical domain with ∂Ω being its boundary. We consider the following problem in the unknown function u: where µ, β, and f are assigned functions or constants, with µ ∈ L ∞ (Ω), β ∈ [L ∞ (Ω)] d , with ∇ · β ∈ L 2 (Ω), and f ∈ L 2 (Ω). The Dirichlet datum on the boundary is g ∈ H 1/2 (∂Ω). Let V g = {v ∈ H 1 (Ω) : v| ∂Ω = g} and V 0 = H 1 0 (Ω); the weak formulation of Eq (1) reads with the bilinear form a(u, v) and the linear functional F (v) respectively defined as: We consider a family of function spaces V h ⊂ V (either for V g and V 0 ) dependent on a parameter h such that dim for all K ∈ T h } be the function space of the FE discretization with piecewise Lagrange polynomials of degree r ≥ 1, T h a triangulation of Ω and h the characteristic size of the mesh, comprised of elements K ∈ T h . By setting V h = X r h V 0 , the Galerkin FE method applied to Eq (2) reads where V g,h = X r h V g . The standard Galerkin-FE method, in Eq (3), can generate numerical oscillations on u h if the problem is dominated by the advection term. In particular, these numerical instabilities can arise if the local Péclet number is Pe h > 1, where The generalized Galerkin method ( [33]) allows for eliminating or mitigating numerical oscillations by adding stabilization terms to the standard Galerkin formulation as In the SUPG method, the additional stabilization term reads: where R(u h ) is the residual in strong formulation of Eq (1), which is defined as: The term τ K , appearing in Eq (6), is the stabilization parameter, which is the focus of this work. The stabilization parameter τ K is generally defined locally, i.e., mesh element by element. In this paper, we consider a uniform stabilization parameter, thus τ K = τ for all K ∈ T h . A universal and optimal definition of τ in terms of the problem data and numerical settings like the mesh size and FE degree is lacking. An extensive review of stabilization parameters for the SUPG method is reported in [25]. The most common choices for τ come from an analytic derivation made for the advection-diffusion problem in 1D with f = 0 and approximated by means of linear finite elements, which reads: where ξ(θ) is the upwind function: If a uniform mesh is used, as we consider in this paper, then the value of h is uniform over T h ; if in addition β and µ are constant, then this implies that τ is uniform over T h . The choice of τ made in Eq (7) represents an optimal choice of the stabilization parameter as it yields a nodally exact numerical solution for the 1D advection-diffusion problem if f = 0 and the FE polynomial order is r = 1 [34,18]. Thus, the stabilization parameter of Eq (7) may not be fully effective to provide the optimal stabilization for a general advection-diffusion problem, e.g., to guarantee a nodally exact solution in 2D/3D or when using FE degrees larger than r = 1. A commonly used generalization of the formula of Eq (7) to higher FE degrees (r > 1) is presented in [18] and reads: Differently from Eq (7), the stabilization parameter τ r in Eq (8) takes into account the contribution of higher polynomial degrees r. Still, this formula is not optimal as it does not guarantee a nodally exact numerical solution. Our goal is to find a general and optimal expression of the stabilization parameter holding for advection-diffusion problems and FE approximations of degree r ≥ 1 to be dependent on: the dimension d, the FE degree r, the mesh size h, the forcing term f , the diffusion coefficient µ and the transport coefficient β.

ANN-based approach for determining the optimal stabilization parameter
We present our approach to determine the optimal stabilization parameter τ for the SUPG stabilization method by using an ANN. We consider as features (inputs) of our ANN: the FE degree r, the mesh size h and the global Péclet number Pe g := |β|L/(2µ) of the advection-diffusion problem, where L is the characteristic length of the problem, β ∈ R d and µ ∈ R . As we consider β to be uniform and fixed, using Pe g as input corresponds to varying the value of the diffusion coefficient µ. The features (input) of the ANN read: the target (output) of interest is the optimal SUPG stabilization parameter that we denote with τ * : The first step consist in generating the dataset, i.e., pairs of inputs and outputs to be used for training the ANN (data generation step). This consists in choosing repeatedly and randomly values of the features x (i) in given ranges. These features are used to feed an optimization problem that, through an optimizer and a suitable error measure, provide an optimal stabilization parameter y (i) , i.e., the target of the given feature. Such optimization problem considers as error measure E(τ ): the mismatch between the numerical solution u h and the exact one u ex over the nodes of the FE mesh. E(τ ) reads as: being x k the k-th node of the FE mesh T h and K h the number of nodes. E(τ ) is an approximation of the L 1 (Ω) norm. To find the minimum of E(τ ) for different problem configurations we solve, for each instance of the input parameters x (i) , the following optimization problem: Thus, the dataset generated consists of m pairs of inputs-outputs (x (i) , y (i) ), with i = 1, . . . , m. The latter is used for training the ANN. The latter will be then used to predict the optimal stabilization parameter y (j) = τ ANN to be used for the FE approximation of the advection-diffusion problem in the settings provided by x (j) . We report in Figure 1 a sketch of the procedure used to build the ANN for the prediction of the optimal stabilization parameter for any new feature x (j) .

Numerical validation
In this section, we validate the proposed numerical strategy by means of a 1D advection-diffusion problem from which the expression of τ 1 in Eq (7) has been derived. In particular, we consider the advection-diffusion problem in Eq (1) with Ω = (0, 1), f = 0 and with Dirichlet BCs u = 0 on x = 0 and u = 1 on x = 1. It admits the following exact solution: We recall that the stabilization parameter τ 1 of Eq (7) yields a nodally exact numerical solution if Eq (1) is solved by linear FE (r = 1).
We plot in Figure 2 the error measure E(τ ) against the value of the stabilization parameter τ , with h = 1/20, Pe h = 12.5 and by using r = 1 ( Figure  2a) and r = 3 ( Figure 2b). We observe that E(τ ) shows a minimum in τ * , thus suggesting the possibility to use an optimization algorithm to solve the problem. Specifically, we employ the L-BFGS-B optimization algorithm from SciPy, an open source library for Python [?]. Instead, the advection-diffusion problem has been solved using the FE open source library FEniCS [2], by applying the SUPG method on a uniform mesh. Particularly, the optimal value τ * found for the case r = 1 corresponds to the one provided by the theory in Eq (7). For the case r = 3, we observe a optimal value of τ * for which the error is minimized: in this case, it does not corresponds to the stabilization parameter τ r provided by the theory in Eq (8): as a matter of fact, the latter arises from empirical considerations to extend the parameter τ 1 in Eq (7) to polynomials degrees r > 1. However, this does not ensure a nodally exact numerical solution.   We solve the optimization problem for different values of the local Péclet number 1 ≤ Pe h ≤ 250 and we plot in Figure 3 the optimal stabilization parameter τ * against the local Péclet number for r = 1, 2, and 3. Figure 3a shows that, by employing a linear FE space r = 1, the optimal stabilization parameter τ * obtained by minimizing E(τ ) exactly matches the theoretical one τ 1 for every value of Pe h , thus confirming the findings of Figure 2a. This also serves as validation of the optimization procedure that we proposed. By recalling that τ 1 of Eq (7) guarantees the numerical solution to be exact at nodes for this particular advection-diffusion problem and r = 1, we can infer that the optimization procedure is meaningful and it can be therefore exploited further, for example for r > 1. On the other hand, Figure 3b and 3c show instead a mismatch between the theoretical τ r of Eq (8) and the optimal one τ * , thus confirming the findings reported in Figure 2b.
In order to better appreciate the differences in the numerical solutions due to the choice of the stabilization parameters, we report, in Figure 4, a comparison of the error E(τ ) obtained by means of the optimal τ * and theoretical τ r stabilization parameters for r = 2 and 3. Moreover, we report in Figure 5 a comparison among the exact solution u, the SUPG-stabilized numerical solution u * h obtained with the optimal stabilization parameter τ * , and the numerical solution u h obtained with theoretical one τ r . Specifically, we consider FE of degree r = 3 and two different values of Pe h . In both these cases, the optimal parameter τ * leads to a more accurate solution with respect to using the theoretical parameter τ r . In particular, when Pe h is "small", τ r leads to overshooting in the numerical solution, while if Pe h is "large" the theoretical stabilization parameter leads to undershooting. Conversely, the optimal parameter τ * accurately intercepts the exact solution at the nodes.

Numerical results
We first introduce the training set of our problem and we detail the setup of the ANN that we use in this work. Then, we show the prediction of the stabilization parameter by means of the ANN on different advection-diffusion problems.

Training the ANN for a 2D advection-diffusion problem
We apply the strategy presented so far to a 2D advection-diffusion problem to generate the dataset for the training of the ANN. Specifically, we consider in Eq (1): Ω = (0, 1) 2 , f = 0, and β = (1, 1). We prescribe the following exact solution on the whole boundary ∂Ω: We generate in Ω a structured mesh T h of triangles with FEniCS [2], as shown in Figure 6. We generate the dataset by repeatedly solving the optimization problem of Eq (12) for varying set of features as described in Section 3; specifically, we choose the features as reported in Eq (9) Table 1. We train a fully-connected feed-forward ANN on the generated dataset by using the open source library Keras [26] built on top of TensorFlow [1]. We divide the dataset into two parts: a training dataset that takes 80% of the examples to be used for the ANN training and a validation dataset that takes the remaining 20%. We choose the loss function as the mean squared error that measures, for each training feature, the squared mismatch between the prediction of the ANN y (j) and the actual target y (j) . Specifically, the loss function is defined as We normalize the features by subtracting their sample mean and dividing by their sample standard deviation in order to help the weights to better adapt to the different scales of the features. Moreover, the targets and the feature Pe g need special care as they are distributed over a wide range of values. Thus, we normalize them using by applying a base 10 logarithm. The normalized features and targets read: where r, h, log 10 (Pe g ) are the sample mean of the training features r, h and the logarithm of Pe g respectively, while σ r , σ h and σ log 10 (Peg) are their sample standard deviations.   In order to find the best performing ANN architecture, we carried out a study by testing different numbers of hidden layers, nodes per layer, optimization algorithm, its learning rate, and batch size. A comparison of the loss function with different architectures is given in Figure 8. Using 3 hidden layers is beneficial in terms of validation error drops, while using more than 64 nodes per layer does not bring to considerable advantages. Finally, we choose an ANN with 3 hidden layers, 64 nodes per layer and an output layer with a single node, all using a rectified linear unit (ReLU ) activation function. We display the ANN architecture in Figure 9. We trained the ANN with the SGD optimization algorithm, a constant learning rate of 0.01, and mini-batch size of 32 samples. Moreover, we employed a momentum of 0.9 in the optimization algorithm to update the weights. The trained ANN is available in the GitLab repository [52]. Regarding the computational efficiency of the proposed strategy, we stress the clear distinction between the offline phase (dataset generation and ANN's training) and the online phase, where we use the ANN to predict a new stabilization parameter -alongside the use of the FE solver -for unseen input parameter values. The most demanding part of the strategy is the dataset generation, requiring approximately 2 h on a standard laptop to repeatedly solve the optimization problem 900 times. Moreover, the ANN's training phase required approximately 10'. The testing (online) phase, which is the one that is performed for application purposes, is considerably inexpensive, requiring only the real-time evaluation of a composition of linear functions, comparable with the evaluation of the empirical relation that brings to the theoretical values ( few milliseconds).

Predictions of the stabilization parameter by ANN
Now, we compare the predictions of the ANN with the theoretical stabilization parameter τ r of Eq (8) [34,18]. We show in Figure 10 (left) the stabilization parameter τ ANN predicted by the ANN by varying mesh size h and global Péclet number Pe g . For comparison, we report in Figure 10 (right) the corresponding theoretical stabilization parameter τ r . We observe that the overall behavior of the trained ANN's predictions are qualitatively similar of the theoretical stabilization parameter τ r . Nevertheless, it can be inferred that the values of the stabilization parameters are almost completely matched with linear FE (r = 1), while they quantitatively differ for r = 2 and r = 3. This was expected as τ r for r > 1 is an empirical extension of the formula for the case r = 1.
The ANN allows to make predictions with features outside the range of values for which it has been trained, that is for unseen values of such features. With this aim, we report in Figure 11 the comparison of the ANN's predictions τ ANN with τ r for the FE degree r = 4. This comparison shows a clear difference between the theoretical and ANN's τ for r = 4 even though the general trend is maintained similar.

Test 1: predictions for the problem used in the ANN's training
We compare the numerical solutions u ANN h and u h obtained by means of the SUPG stabilization method with the parameter τ ANN predicted by the ANN and the theoretical one τ r , respectively; the comparison also involves the exact solution u (13) of the 2D advection-diffusion problem used for the training of the ANN. In particular, we display the comparison of the former 2D solutions in Figure Figure 12b). We notice that in both the cases, the numerical solution u ANN h involving the stabilization parameter τ ANN provides more accurate results than with the theoretical stabilization paramter τ r . In particular, in the case (a), u h involves a much smoother boundary layer and overshoots the exact solutions u, conversely to the nearly nodally exact numerical solution u ANN h . In the case (b), u ANN h provides a much better representation of the bundary layer, without the undershooting of the solution u exhibited by u h . Furthermore, we report in Table 2 absolute errors between the numerical solutions (u ANN h and u h ) and the exact one u. The errors, computed in L 2 (Ω) and H 1 (Ω) norms, show that the ANN-based SUPG stabilization method very often produces more accurate results than the ones obtained with theoretical stabilization parameter τ r , especially for r = 3.
Moreover, to assess the ability of the ANN to predict the stabilization parameter out of the training range, we compare in Figure 13 the numerical solutions for high Péclet in the case r = 4. We observe that u ANN h is more accurate than u h even with FE degree out of the training range.  and u h ) and the exact one u (13) of the 2D advectiondiffusion problem used for the training (Section 5.1) for different values of the features.
We compare in Figure 14 the numerical solutions u ANN h and u h with the novel exact solution u for different Péclet and mesh sizes. As for the previous numerical tests, the stabilization parameter τ ANN predicted by the network provides more accurate numerical solutions u ANN h than with the theoretical stabilization parameter τ r . In particular, the boundary layers and overall solution behaviours are better represented. We better assess the former qualitative consideration, by computing the error E(τ ) associated with the numerical solutions with SUPG stabilization for the parameters τ ANN and τ r ; different values of Pe g and r are considered. In particular, Figure 15 compares E(τ ) against Pe g (in logarithmic scale) for different values of r: the error obtained by using τ ANN is always lower than the one achieved with τ r . These results indicate that τ ANN , although trained on a specific advection-diffusion problem, can be used to make predictions of the optimal τ for an unseen advection-diffusion problem in place of the theoretical stabilization parameter.

Test 3: predictions for and unseen problem with constant forcing term without exact solution
In this section, we consider an advection-diffusion problem with constant forcing term f = 1, advection coefficient β = (1, 1), and boundary conditions u = 0 on ∂Ω. Although there is not an exact solution to the given problem, we consider as "ground truth" (reference) solution a numerical solution obtained on a much finer grid (h = √ 2/400) without stabilization. In Figure 16, we compare the SUPG numerical solutions u h , u ANN h against our "ground truth" solution for two different Péclet numbers. The numerical solution with the stabilization parameter τ ANN provides more accurate results with respect to the the solution obtained with the theoretical parameter. In particular, we highlight the largest discrepancies observed in the corners of the extracted solution.

Test 4: prediction for an unseen problem with a non-constant forcing term
In this section, we cope the case of a non costant forcing term by considering advection-diffusion problem of Eq (1) in Ω = (0, 1) 2 with β = (1, 1), and we prescribe the following exact solution on the whole boundary ∂Ω: We display the exact solution of the considered problem in Figure 17. In particular, Figures 18 and 19 show E(τ ) against Pe g for different values of r with two different meshes with h = √ 2/10 and h = √ 2/20, respectively.
The error obtained by using τ ANN is always comparable to the one achieved with τ r , for both mesh levels and for all the FE degrees considered. These results suggest that τ ANN , although trained on a specific advection-diffusion problem with f = 0, is robust with respect to unseen parameters and data in the advection-diffusion problem, including non constant f . Nevertheless, further studies can be conducted to better assess the role of a non-constant forcing term into the ANN's training phase, possibly encompassing the accuracy of the theoretical stabilization parameter in this scenario too.

Direction of the advection field and stabilization parameters
We investigate the effect of the advection direction on the value of the stabilization parameter. We define θ as the angle between the advection velocity and the x-axis: β = (|β| cos θ, |β| sin θ) T . We carry out the optimization strategy introduced in Section 3 including also θ in the input parameter x (i) . We apply the optimization to the advection-diffusion problem with exact solution in Eq (15) varying θ in [π/12, π/2] and we report a comparison between the optimal, theoretical and ANN's stabilization parameter in Figure 20 (left). We set |β| = √ 2, Pe g = 7 071, h = √ 2/20, r = 3. We recall that, differently from the optimization strategy, in this plot we are still employing the ANN trained without accounting for the advection direction. We can observe that the optimal τ is affected by θ, a feature that is not accounted by the theoretical stabilization parameter. Furthermore, in Figure 20 (right), we compare the mean errors obtained with the optimal, theoretical and ANN's stabilization parameter. It can be observed that the ANN still provides an advantage with respect to the theoretical stabilization. However, we believe that by including the advection direction as an additional feature in the training phase of the ANN, a higher accuracy can be obtained that the one achieved in this paper, where a single direction (θ = π 4 ) has been considered. π / 12 π / 6 π / 4 π / 3 π / 2.

Conclusions
In this work, we presented an approach based on machine learning and ANN to compute the optimal stabilization parameter to be used in the SUPG FE approximation of advection-diffusion problems. Indeed, albeit the expression of the stabilization parameter is available for 1D problems and FE of degree r = 1, its extension to more general advection-diffusion problems (2D and 3D) and FE degrees r > 1 is still lacking. We validated our approach against the 1D case, for which the ANN-stabilization parameter matches the already optimal, theoretical one for r = 1, leading to nodally exact numerical solutions Instead, for higher polynomials degrees r > 1, remarkable differences are observed among the theoretical and optimal stabilization parameter: we observed better accuracy and stabilization properties of the numerical solution with the optimal stabilization parameter with respect to the theoretical one.
Then, we generated the dataset on a 2D advection-diffusion problem, and we used it to train a fully-connected feed-forward ANN. We applied the predictions of the network on the original advection-diffusion problem for unseen input values and we also apply it to an unseen advection-diffusion problem to check for model generalization. Our numerical results showed that the proposed ANN-based approach provides more accurate numerical solutions than using the theoretical stabilization parameter for the SUPG method.
This work represents a step towards the enhancements of stabilization meth-ods for the FE approximation of advection-dominated differential problems. In particular, this work is limited to few features for the ANN-based stabilization parameter and provides as output a single optimal stabilization parameter meant for the whole mesh FE T h . Natural extensions of this work will therefore involve local stabilization parameters, i.e., element by element over the mesh T h , to better account for non-uniform meshes, differential problems with varying coefficients, and capturing the local behaviour of the solution. In addition, to enhance the robustness of the ANN-based approach, possible additional inputs of the network are the angle of the advection velocity and the nodal value of the forcing term. A future development consists in the extension of the proposed approach to the 3D case for which, as for the 2D case, a universal definition of the stabilization parameter does not exist. The extension to the 3D case surely would present a larger cost in the training phase; however, once trained, the ANN could be used online real-time. Finally, we envision extending the proposed machine learning approach to SUPG and variational multiscale stabilization methods for the FE approximation of the Navier-Stokes equations.