Parametric Nonlinear Model Reduction Using K-Means Clustering for Miscible Flow Simulation

This work considers the model order reduction approach for parametrized viscous fingering in a horizontal flow through a 2D porous media domain. A technique for constructing an optimal low-dimensional basis for a multidimensional parameter domain is introduced by combining K-means clustering with proper orthogonal decomposition (POD). In particular, we first randomly generate parameter vectors in multidimensional parameter domain of interest. Next, we perform the K-means clustering algorithm on these parameter vectors to find the centroids. POD basis is then generated from the solutions of the parametrized systems corresponding to these parameter centroids. The resulting POD basis is then used with Galerkin projection to construct reduced-order systems for various parameter vectors in the given domain together with applying the discrete empirical interpolation method (DEIM) to further reduce the computational complexity in nonlinear terms of the miscible flow model. The numerical results with varying different parameters are demonstrated to be efficient in decreasing simulation time while maintaining accuracy compared to the full-order model for various parameter values.


Introduction
Viscous fingering is the formation of patterns in a morphologically unstable interface between two fluids in a porous medium. It generally refers to the onset and evolution of instabilities that occur in the displacement of fluids [1]. The mathematical description of viscous fingering can be given in terms of several partial differential equations. Many numerical methods are used to find the approximation of pressure, velocity, concentration, or temperature. In [2], the coupled set of partial differential equations is approximated by the semidiscrete SUPG stabilized finite element formulation plus a discontinuity capturing technique to improve stability around the moving sharp fronts. The pressure equation is discretized by the standard Galerkin method, and a postprocessing scheme is used to improve the numerical evaluation of Darcy's velocity. Branston et al. developed a finite volume numerical simulator with a level of certainty where fingering perturbations in stochastic fields do not result into severe channeling as discussed in [3]. The simulator development involves discretization of convection-diffusion and pressure equations with total variation diminishing and central differencing schemes, respectively. Correlated stochastic fields for simulation runs were created using convolution with a Gaussian filter achieved through 2D fast and inverse Fourier transform, Dykstra-Parsons coefficient, random porosity generator, and autocorrelation lengths. In [4], the Oldroyd model has been studied using a generalized incompressible Navier-Stokes equation via the interpolating stabilized element free Galerkin technique. In this work, the Crank-Nicolson method is first utilized to discrete the temporal direction, and then, the meshless element free Galerkin technique is employed to approximate the spatial direction. The equation contains the integral term which is approximated by using a difference scheme. Although the numerical methods discussed earlier are effective and accurate, the resulting simulations generally require enormous memory storage and consume excessively computational time. In order to manage this problem, model order reduction techniques have been proposed in many previous works for various applications.
Proper orthogonal decomposition (POD) is a commonly used technique for model order reduction. POD is used to generate the optimal basis that can describe the dominant features of a dynamical system. This basis is utilized with the Galerkin projection to construct a reduced-order model. As in [5], a surrogate model based on the proper orthogonal decomposition is developed in order to enable fast and reliable evaluations of aerodynamic fields. In 2017, Chang et al. introduced a reduced-order model for wall shear stress in abdominal aortic aneurysms by proper orthogonal decomposition [6]. A more interesting detail on POD can be found in [7][8][9].
Using POD still has a limitation when the model has complicated nonlinear property. POD cannot handle the complexity issue arising in computing nonlinear terms. A powerful technique called the discrete empirical interpolation method (DEIM) is used here to overcome this problem. In particular, the DEIM is employed to construct selected interpolation indices, which will allow much less expensive evaluation for the nonlinear terms. The DEIM can be used to combine with POD to improve the efficiency of the nonlinear reduced-order model. In [10], the POD-DEIM approach has been applied to the groundwater equation and the time-dependent incompressible Navier-Strokes equation with variable density based on the meshfree RBF-FD technique. In [11], POD-DEIM is applied to a dynamic 2D catalytic reactor model. There are also many other applications of POD-DEIM on various problems, which can be found, for example, in [12][13][14].
In this paper, we apply the POD-DEIM technique on a finite difference discretization of a nonlinear miscible flow with viscous fingering in porous media. This model contains many parameters which can be changed in many different situations. Our additional effort is to construct a reducedorder model, which can be used to approximate the solution with any given parameter in the domain of interest. This issue in model reduction is crucial for various parametrized dynamical systems. There are a number of works that have been done for parametric model order reduction (PMOR). In [15], the PMOR method for laminated composite structures is presented. The parametrization is carried out for the angle between the layers of structure and the thickness of the structure. In [16], this work proposed the PMOR scheme for second-order systems that does not require a priori sampling of the parameter space. For the flow model, PMOR for probabilistic analysis of unsteady aerodynamic applications is discussed in [17]. Also in [18], the PMOR approach is applied to the fluid flow governed by the Navier-Stokes equations at varying Reynolds number. In this work, we use K-means clustering to efficiently construct a reduced system for a parametrized miscible flow model. This paper is arranged as follows. We first formulate the model equations in Section 2. In Section 3, a general setup for constructing POD-DEIM reduced systems is discussed. Section 4 introduces a parametrized nonlinear model reduction using K-means with the POD-DEIM approach for the miscible flow model. In Section 5, the numerical experiments are used to illustrate the efficiency of the proposed approach. We finally conclude the results and discuss some future research directions in Section 6.

Problem Formulation
We consider a viscous fingering in a horizontal flow through a 2D porous media domain [19] with a constant permeability K given by where u = ½u 1 , u 2 ∈ ℝ 2 is the velocity with components in x and y coordinates, P is the global pressure, c is the concentration of the injected fluid, T is the temperature, μ is the viscosity depending on c and T, f ðcÞ is the reaction rate, and ρ, c p , D, D T , and ΔH are assumed to be constant. To solve (1)-(4), a general procedure nondimensionalizes the system which will be converted into the equations in terms of stream function ψ and vorticity ω as follows: ∂T ∂t where R c and R T are constant parameters determining the effects of concentration and temperature to the viscosity, Da (Damköhler number) and Le (Lewis number) are dimensionless parameters, wherex = αPe/2 is the interface location. Let 0 = x 0 < x 1 < ⋯ < x n x < x n x +1 = αPe and 0 = y 0 < y 1 < ⋯ < y n y < y n y +1 = Pe be equally spaced grid points on x-axis and y-axis with Δx = αPe/ðn x + 1Þ and Δy = Pe/ðn y + 1Þ, respectively. Define ψðtÞ, ωðtÞ, cðtÞ, and TðtÞ as a vector of dimension n = n x n y containing approximate solutions ψðx i , y j , tÞ, ωðx i , y j , tÞ, cðx i , y j , tÞ, and Tðx i , y j , tÞ, respectively, for i = 1, 2, ⋯, n x and j = 1, 2, ⋯, n y . The central finite difference is applied to construct the spatial discretization of (5)- (8), and it becomes a system of nonlinear ODEs in the form of is a parametrized constant vector; Eðp ; ·, · Þ: ℝ 4n × ℝ ⟶ ℝ 4n is a nonlinear function with ; is the vector of state variables, and p = ½R c ; R T ; Da ; Le ; Pe is the vector of parameters. Note that A, A x , A y ∈ ℝ n×n are constant coefficient matrices, O ∈ ℝ n×n is the zero matrix, I n ∈ ℝ n×n is the identity matrix, b, b x ∈ ℝ n are vectors indicating the boundary conditions, and N, F are nonlinear functions that maps some state variables to ℝ n . We strongly recommend that the reader should read more details in the derivation of (13) in Appendix A. The solution z will generally depend on the parameter vector p, and therefore, we will also denote zðtÞ as zðp ; tÞ = ½ψðp ; tÞ ; ωðp ; tÞ ; cðp ; tÞ ; Tðp ; tÞ. More details on discretization and numerical schemes can be found in [19,20].

Reduced-Order Modeling by the POD-DEIM Approach
This section provides a brief overview of the general procedure for constructing a reduced-order model of a differential equation using proper orthogonal decomposition (POD) and a discrete empirical interpolation method (DEIM).
⋯, ϕ k g ⊂ ℝ n , which has rank k < n. Then, the approximation is in the form where Φ k = ½ϕ 1 , ϕ 2 , ⋯, ϕ k ∈ ℝ n×k is a matrix of basis vectors and q = ½q 1 , q 2 , ⋯, q k T ∈ ℝ k is the vector of unknown coefficients. To find q, we use the fact that Φ k has orthonormal columns, i.e., Φ T k Φ k = I k , and the minimum error occurs when the residual is orthogonal to the column span of Φ k , i.e., Φ T k ðz j − Φ k qÞ = 0, which implies that q = Φ T z j and the approximation becomes Proper orthogonal decomposition (POD) provides an orthonormal basis that minimizes this approximation error in 2-norm for a given basis rank k ≤ k * ≔ rank ðZÞ; i.e., POD basis is the optimal solution to the minimization problem where I k ∈ ℝ k×k is the identity matrix. It can be shown [21] that the POD basis defined above can be obtained from the left singular vector of the snapshot matrix Z. Let Z = UΣV T be the singular value decomposition of Z, where matrices U = ½u 1 , ⋯, u k * ∈ ℝ n×k * and V = ½v 1 , ⋯, v k * ∈ ℝ n s ×k * are matrices with orthonormal columns and Σ = diag ðσ 1 , ⋯, Moreover, this minimum error is given by [21] the sum of neglected singular values squared, i.e., As a result, the decay of singular values can be used to specify the dimension k of the subspace of the POD basis used in the approximation. In most existing applications, POD can provide a basis that explains the overall dynamics of a given snapshot set and we can use reduced dimension k that is much less than the original dimension n, while maintaining high accuracy in the approximation.

Discrete Empirical Interpolation Method (DEIM).
In the context of nonlinear model reduction for dynamical systems, the DEIM [22] is generally used for reducing the complexity for evaluating the projected nonlinear term in the POD-Galerkin reduced system. To illustrate this issue, we consider the projected nonlinear functionẼð·, · Þ: For convenience, the notation will be simplified by leaving the dependence on time parameter t. Notice that, for a givenz, the computational complexity for computing the above nonlinear term depends on the dimension n when multiplying U T k with E. This can be avoided by using the DEIM approximation of the form where the matrix U r is the POD basis matrix of rank r of the nonlinear snapshot matrix E mat = ½E 1 , E 2 , ⋯, E n s , for E j = E ðz j , t j Þ with z j coming from the available snapshot in ℝ n (z j ≈ U kz from POD approximation), j = 1, ⋯n s , and the matrix P is constructed from the interpolation indices ℘ 1 , ℘ 2 , ⋯, ℘ r , which are generated by the DEIM algorithm shown in Algorithm 1. In particular, Note that the complexity for evaluating (22) can depend only on the reduced-order dimension r and k because the term U T k U r ðP T U r Þ −1 can be precomputed for any given input vector z and the term P T E can be considered as selecting r components corresponding to the indices ℘ 1 , ℘ 2 , ⋯, ℘ r . From Algorithm 1, the procedure initializes the first interpolation index ℘ 1 ∈ f1, 2, ⋯, ng by using the first input basis u 1 entry which has the largest magnitude. The rest of the indices ℘ j for j = 2, 3, ⋯, r are selected from the residual r = v j − Uc with the largest magnitude. More details on this procedure as well as the proof for invertibility of P T U r can be found in [22].

The POD-DEIM Reduced System. Consider an initial value problem of the form
where M ∈ ℝ n×n , A ∈ ℝ n×n , and b ∈ ℝ n are constant matrices; EðzÞ is the nonlinear function; and z = zðtÞ ∈ ℝ n is the state variable. The POD reduced system can be obtained by applying the Galerkin projection with the POD basis U k of dimension k (see Section 3.1), and this system can be written in the form Journal of Applied Mathematics variable that can be used to approximate the original state variable z by z ≈ U kz . Finally, after approximating the nonlinear term as shown in Section 3.2, we obtain the POD-DEIM reduced system in the following form: (22). The next section extends the previous setup for reduced-order modeling for the parametrized miscible flow model.

Model Reduction for the Parametrized Miscible Flow Model by Using the POD-DEIM Approach with K-Means Clustering
This section applies K-means clustering with the POD-DEIM reduced-order modeling described in Section 3 on the parametrized miscible flow model given in Section 2.
In this work, the numbers of spatial grid points are n x and n y = 100, i.e., the dimension of the full-order model is n = n x n y . The computation is terminated at the time t f with time stepsize Δt. Moreover, we are interested in an exothermic reaction, which implies that sgn ðϕÞ = 1. The parameters α, d, Da, and Pe are fixed to be 2, 0.1, 0.01, and 250, respectively. Furthermore, the random noise between 0 and 1 is added to the initial condition for concentration at the collocation points (x = αPe/2 for all y ∈ ½0, Pe) to generate the instability which employs the viscous fingering.

K-Means
Clustering. Let I R c ⊂ ℝ and I Le ⊂ ℝ be the closed intervals for the values of R c and Le, respectively. Two parameters ðR c , LeÞ are generated randomly in I R c × I Le ⊂ ℝ 2 for 1000 samples and clustered into K centroids using the K-means clustering algorithm as shown in Figure 1.
The K-means clustering algorithm for generating centroids is shown in Algorithm 2 as follows. From Algorithm 2, the requirements are number of desire cluster K, number of iteration iter, and the data set which is the set RcLe that contains 1000 training samples of parameters ðR c , LeÞ ∈ ½2,2:5 × ½1, 2 ⊂ ℝ 2 in this work. The procedure starts with first randomly picking K initial centroids, p 1 , p 2 , ⋯, p K . Then, every point in training set measured the distance between each centroid. The points that have a minimum distance to centroid p k are gathered together in the k th cluster. The new centroids will be updated by averaging the whole samples in each cluster. This procedure continues and terminates at the l = iter. As discussed previously, we use K = 20 in this work.

The POD-DEIM Parametrized Reduced Systems.
The snapshots for constructing the POD basis are obtained from the full-order models corresponding to all selected centroids from the K-means clustering algorithm. Let z ℓ s = ½ψ ℓ s , ω ℓ s , c ℓ s , T ℓ s be the numerical solution of zðp ℓ ; t s Þ at time t s from the discretized system of (26) with p = p ℓ , which is the ℓ th parameter centroid from K-means clustering, i.e., we obtain the snapshots from solving the following 20 full-order systems: The snapshots ψ ℓ s ,ω ℓ s , c ℓ s , and T ℓ s are the numerical solutions of (26) that approximate ψðp ℓ ; t s Þ, ωðp ℓ ; t s Þ, cðp ℓ ; t s Þ, and Tðp ℓ ; t s Þ, respectively, for s = 1, ⋯, n s and ℓ = 1, ⋯K.

Journal of Applied Mathematics
Let C = ½c 1 1 , c 1 2 , ⋯, c 1 n s | c 2 1 , c 2 2 , ⋯, c 2 n s |⋯|c K 1 , c K 2 , ⋯, c K n s ∈ ℝ n×Kn s be the snapshot matrix for concentration. In our numerical experiment, K = 20 and n s = 1500. To obtain the POD basis, the singular value decomposition (SVD) of C is computed, i.e., C = U c Σ c V T c . Then, the POD basis of dimen-sion k ≪ rank ðCÞ is the first k columns of the left singular matrix U c of C denoted by U k c ∈ ℝ n×k . Similarly, let U k T , U k ψ , and U k ω be the POD basis of dimension k for temperature, stream function, and vorticity, respectively. The POD reduced system is constructed by applying the Galerkin Require: Inputs (i) K ∈ ℕ, (ii) iter ∈ ℕ, (iii) RcLe = f½R ðiÞ c , Le ðiÞ T : R ðiÞ c ∈ I R c ⊂ ℝ n and Le ðiÞ ∈ I Le ⊂ ℝ n g 1000 i=1 ⊂ ℝ 2 Ensure: Outputs p 1 , p 2 , ⋯, p K 1: Choose randomly K initial centroids, p 1 , p 2 , ⋯, p K from RcLe 2: D = zerosð1000, K + 1Þ 3: for l = 1 : iter do 4: for i = 1 : 1000 do 5: for j = 1 : K do 6: Dði, jÞ = kRcLeð: ,iÞ − p j k 2 7: end for 8: ½m, label = min ðDði, 1 : KÞÞ 9: Dði, K + 1Þ = label 10: end for 11: for k = 1 : K do 12: p k ⟵ meanðRcLeð: ,Dð: ,K + 1Þ == kÞÞ 13: end for 14: end for Algorithm 2: K-means clustering algorithm.   Journal of Applied Mathematics projection on (26) with the basis U k = diag ðU k ψ , U k ω , U k c , U k T Þ ∈ ℝ 4n×4k . The state variable is approximated by zðtÞ ≈ U k zðtÞ, wherezðtÞ = ½ψðtÞ,ωðtÞ,cðtÞ,TðtÞ ∈ ℝ 4k is the state variable of the POD reduced-order system, i.e., cðtÞ ≈ U k cc ðtÞ, TðtÞ ≈ U k TT ðtÞ, ψðtÞ ≈ U k ψψ ðtÞ, and ωðtÞ ≈ U k ωω ðtÞ. The resulting POD reduced system is in the form of whereMðpÞ = U k T MðpÞU k ,ÃðpÞ = U k T AðpÞU k ∈ ℝ 4k×4k , and dðpÞ = U k T dðpÞ ∈ ℝ 4k are constant matrices for a given parameter vector p, andẼðp ;z, tÞ = U k T Eðp ; U kz , tÞ is the projected nonlinear term. As described in Section 3, we can obtain a POD-DEIM reduced system from the POD reduced system (27) as shown below: whereÊðp ;z, tÞ is the nonlinear term obtained from using the DEIM approximation for each nonlinear term inẼ in (27). Note thatÊðp ;z, tÞ is not directly constructed as explained in Section 3, since it consists of many nonlinear functions NðψðtÞ, cðtÞÞ,NðψðtÞ, TðtÞÞ,FðψðtÞ, cðtÞÞ,FðψðtÞ, TðtÞÞ, and fðcðtÞÞ given in (13) that depends on different state variables. Therefore, we separately apply this approximation to each nonlinear term. For example, let us first construct the nonlinear snapshot matrix for NðψðtÞ, cðtÞÞ, where N ℓ s = Nðψ ℓ s , c ℓ s Þ. The DEIM is then applied on the nonlinear term NðψðtÞ, cðtÞÞ. Other nonlinear terms are done in the same manner. Details on the construction of each nonlinear term by the DEIM can be found in Appendix A which follows from [20]. Finally, the construction ofÊ is just the combination of the DEIM results in each nonlinear term. The numerical simulations based on the techniques described in this section are illustrated and analyzed in Section 5.

Numerical Experiments
In this section, there are two main numerical tests that consider accuracy and efficiency in terms of reducing the simulation time of the model reduction framework presented in this paper. In Section 5.1, the numerical solutions of the proposed method and the original full-order system are compared for different parameter values that are not in the training set. In Section 5.2, the overall computational time reduction and average errors are considered for the solutions from the proposed method when using various randomly selected parameter values in the interested domain. The relevant MATLAB code contributed in this work is available at https://bit.ly/3airQ2u. In this work, the number of spatial grid points is n x = 150 and n y = 100, i.e., the dimension of the full-order model is n = n x n y = 15000. The computation is terminated at the time t f = 150 with Δt = 0:1 (1500 time steps). The system temperature is set to be constant (isothermal case), i.e., it is equivalent to set R T = 0.
To investigate the possibility of reducing the dimension, we first consider the singular values of both solution snapshots and nonlinear snapshots as depicted in Figure 2. Note that this work illustrates only the approximations of concentration. Other state variables have similar accuracy results since they are all coupled in the differential equations.

Numerical Test I.
In this section, the POD and POD-DEIM reduced-order model are compared to the full-order model for different given parameters ðR c , LeÞ ∈ ½2,2:5 × ½1, 2 ⊂ ℝ 2 . To demonstrate the efficiency of the reduced-order model, the error at a certain time t s and the global error and its average are defined in terms of the Frobenius norm as follows: where cðt s Þ andcðt s Þ are the concentrations obtained from the full-order and reduced-order models, respectively, at the time step s or the time t s , and n s is the number of snapshots. In this experiment, n s = 1500. The approximationcðt s Þ can be the solution from the POD or POD-DEIM reduced-order model. Some cases of parameters ðR c , LeÞ ∈ ½2,2:5 × ½1, 2 are chosen arbitrarily, and the corresponding numerical results are shown below. In particular, we will consider the cases of ðR c , LeÞ = ð2, 1Þ, ð2:5,2Þ, and    Tables 1 and 2, respectively, for Sections 5.1.1 and 5.1.2. As expected, the increasing number of POD basis k provides more accuracy as the error seems to be decreasing. However, this can increase the simulation time as shown in Table 3, which will be discussed in more detail later in Section 5.2. Similarly, in Section 5.1.3, all results are demonstrated in the same manner as seen in Figure 5 and Table 4. The difference is the parameters that are chosen inside the parameter domain, i.e., ðR c , LeÞ = ð2:4,1:1Þ. Table 4 shows that the error at a specific time and the average global error are smaller compared to the previous cases.  Table 3.
From Table 3, applying the POD method can make the simulation faster, but it is only approximately twice or three times speedup due to the inefficiency in computing the nonlinear term. After the DEIM is applied, the average CPU time speedups by roughly the order of Oð100Þ at equivalent accuracy. In particular, the POD method with k = 40 uses approximately 1.3 hours (49,2821 seconds), while the POD-DEIM method with k = 40, r = 60 uses roughly only 1.3 seconds. In this case, the average error for both POD and POD-DEIM approach is approximately the same. As shown in the previous numerical test, an increasing dimension can improve the accuracy with a trade-off in computational time.
The results in this section suggest that the POD-DEIM reduced-order model with the basis constructed from the K -means approach is reliable in approximating the solution for any given parameter in the interested parameter domain ½2,2:5 × ½1, 2, even though they are not in the training set.

Conclusions
This work provides a model order reduction technique for simulating the parametrized viscous fingering model in a horizontal flow through a porous media domain by using several techniques. The K-means clustering algorithm is first used to cluster parameters equipped in the model. The snapshots are collected from solving the solution from each cen-troid. Then, the POD is applied in conjunction with the DEIM to construct a reduced-order model. The resulting reduced-order model can be calculated entirely in the reduced dimension without dependence on the original dimension for various parameter values that are not used in the training set obtained from the K-means algorithm. The approximation admits faster evaluation while trading a small amount of accuracy. The error estimates of the proposed approach for this nonlinear miscible flow model can be obtained by incorporating the accuracy of K-means clustering with the error analysis from [23] and from [24] for, respectively, a posteriori error and a priori error estimates. These theoretical results are beyond the scope of this work and left to be considered in the future. h i The coefficient matrices and boundary vectors are reduced to be a very smaller size ℝ k×k and ℝ k . Nevertheless, the projected nonlinear terms which occur in (A.6)-(A.9) still have computational complexities depending on the original dimension n. In the next section, we introduce the prevalent technique to overcome this problem.
A.3. Discrete Empirical Interpolation Method (DEIM). The DEIM is an efficient way to reduce the complexities for evaluating nonlinear terms (A.10)-(A.14) as previously discussed. The DEIM provides the evaluation in a nonlinear term at the selected interpolation indices. Suppose that we already have nonlinear snapshots matrices N c , N T , F c , F T , and f c for N 1 , N 2 , F 1 , F 2 , and f, respectively. Note that these snapshots are by-product obtaining from the full-order model calculation. The SVD is computed for each nonlinear snapshot matrix, and the first r columns of left singular matrices are chosen. For convenience, we will show only N 1 case; the other cases will be done in the same manner. Let U r N 1 ∈ ℝ n×r be the POD basis of dimension r for N 1 which is obtained from the SVD of N c . This follows that

Consider a matrix
where e ℘ i = ½0, ⋯, 0, 1, 0, ⋯, 0 T ∈ ℝ n is the ℘ i column of the identity matrix I n ∈ ℝ n×n . By premultiplying P T N 1 both sides of (A.18), the selection of components in the nonlinear terms N 1 is made as follows: Hence, the final approximation form of (A.18) becomes If the DEIM is applied for all cases of nonlinear terms, (A.15)-(A.17) can be expressed as 10 Journal of Applied Mathematics where the nonlinear terms are reduced to bẽ ðA:23Þ ðA:24Þ ðA:25Þ Finally, the POD-DEIM approximation is obtained by solving (A.28)-(A.31) and projecting them onto the original dimension.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.