Model order reduction of flow based on a modular geometrical approximation of blood vessels

We are interested in a reduced order method for the efficient simulation of blood flow in arteries. The blood dynamics is modeled by means of the incompressible Navier–Stokes equations. Our algorithm is based on an approximated domain-decomposition of the target geometry into a number of subdomains obtained from the parametrized deformation of geometrical building blocks (e.g., straight tubes and model bifurcations). On each of these building blocks, we build a set of spectral functions by Proper Orthogonal Decomposition of a large number of snapshots of finite element solutions (offline phase). The global solution of the Navier–Stokes equations on a target geometry is then found by coupling linear combinations of these local basis functions by means of spectral Lagrange multipliers (online phase). Being that the number of reduced degrees of freedom is considerably smaller than their finite element counterpart, this approach allows us to significantly decrease the size of the linear system to be solved in each iteration of the Newton–Raphson algorithm. We achieve large speedups with respect to the full order simulation (in our numerical experiments, the gain is at least of one order of magnitude and grows inversely with respect to the reduced basis size), whilst still retaining satisfactory accuracy for most cardiovascular simulations.


Introduction
Cardiovascular disease is the leading cause of death worldwide. This term broadly encompasses a variety of pathological cases ranging from heart disease to many other peripheral vascular diseases. Consequently, the numerical simulation of blood flow in the cardiovascular system has gained considerable attention during the last twenty years as a valuable quantitative tool for the study and diagnosis of such conditions [1,2].
Blood dynamics is typically modeled by means of the incompressible Navier-Stokes equations; their discretization by numerical methods such as the finite element (FE) method leads to the Full Order Model (FOM). Despite the rapid and constant growth in computational power of the architectures that are currently employed to run such simulations, the time and resources required are often incompatible with clinical practice. Moreover, in many cases the numerical results of the FOM are affected by a level of uncertainty not inherently associated with the employed numerical method but rather with the inexact geometry and boundary conditions considered in the simulation [3,4]. Reduced Order Models (ROMs) aim at lowering the computational burden of FOMs at the cost of settling for larger approximation errors. This is particularly desirable in multi-query scenarios, i.e., whenever the same simulation needs to be performed for multiple values of the input parameters (for instance, in order to quantify the uncertainty due to the problem data [5]).
Among the ROMs employed in the context of cardiovascular modeling are the popular 0D and 1D models [6][7][8]. These models consider a coarse approximation of systems of arteries as electric circuits (0D) or as segments in which the quantities of interest are obtained by an averaging process across the section of the vessels (1D). They often prove to be remarkably accurate in approximating flow rates and pressure drops [9], albeit the strong geometrical approximations inevitably entail a significant loss of local details. For this reason, algorithms to couple 0D and 1D models with full 3D simulations-to be employed in the regions in which higher quality solutions are required-have been devised [10][11][12].
Some recent works have introduced strategies to overcome the main limitation of 1D models, namely that only the axial component of the flow is accurately characterized. In particular, in [13,14], the authors propose considering different discretization methods for the longitudinal and transversal directions (for example, the finite element method in the former and spectral methods in the latter). The Transversally Enriched Pipe Element Method (TEPEM) [15,16] further develops this idea: here, the geometry is decomposed into several pipe-type elements in which the two flow components are separately discretized. This method has also been applied to uncertainty quantification [17].
In this paper, we aim at formalizing a ROM allowing to approximate the local features of the blood flow. The strategy is based on the combination of a domain-decomposition approach with the Reduced Basis (RB) method and can be interpreted as a specific implementation of the Reduced Basis Element (RBE) method [18,19]. We refer the reader to [20][21][22][23][24] for uses of the RBE method for the approximation of the 2D steady Stokes equations in the context of the cardiovascular system. To our knowledge, our work represents the first application of the RBE method to the unsteady 3D Navier-Stokes equations.
Similarly to 1D models, the proposed method is based on the approximation of the vessel geometry as a composition of simple subdomains. However, in our approach, these subdomains are three dimensional and obtained from the parametrized geometrical deformation of a handful of elementary building blocks (e.g., straight tubes and bifurcations). Each building block in its reference configuration is equipped with a set of spectral basis functions. Since in this paper we focus on the FE method to generate the FOMs, these are FE functions defined over triangulations of the building block. Specifically, the basis functions are found by means of Proper Orthogonal Decomposition (POD) of a large number of flow solutions which are computed during a computationally expensive offline data generation phase. The global flow approximations by our ROM is computed as a composition of local (to the subdomains) solutions, namely linear combinations of the basis functions defined in every subdomain scaled by the divergence-free Piola transformation. The local solutions are coupled by a nonconforming domain-decomposition method based on the use of spectral Lagrange multipliers on the 2D interfaces [25].
The gain in performance with respect to the FOM is given by the decreased size of the linear system to be solved in each iteration of the Newton-Raphson algorithm. Indeed, while the number of degrees of freedom in the FE model is typically large (tens or hundreds of thousands per each subdomain in our numerical simulations), only a few hundred basis functions per subdomain are sufficient to retain acceptable levels of accuracy.
The rest of this paper is structured as follows. In Section 2, we provide a self-contained and concise introduction to the RB method in order to set the notation and terminology for the remaining sections. Section 3 is dedicated to the Navier-Stokes equations and their numerical discretization by the FE method. In Section 4, we define the concept of modular approximation of arteries by the above-mentioned domain-decomposition approach; we also address the numerical solution of the Navier-Stokes equations on the decomposed geometries by the FE method and the nonconforming domain-decomposition method. In particular, we devise an ad-hoc preconditioner that takes advantage of the peculiar block structure of the global system matrix. It is worth noticing that, although the paper focuses on a ROM, addressing the solution of the partitioned problem with the FE method is necessary, as the RB functions in the subdomains are generated by POD of local solutions obtained from global problems in decomposed domains. This strategy of data collection (offline phase) is discussed in Section 5. In the same section, we also delineate the algorithm for the approximation of the global solution on a decomposed target geometry using the ROM (online phase). Our numerical results are reported in Section 6. Specifically, Sections 6.1 and 6.2, respectively, focus on two critical points: (i) the accuracy with respect to the corresponding FE solution in the decomposed geometry-which, in our case, plays the role of FOM-and the achieved speedup, and (ii) the comparison of the reduced solution to the one obtained on a physiological and non-decomposed geometry. In the latter case, we mostly aim at evaluating the effects of the geometrical approximation on the local features of the flow, for instance, in terms of the wall-shear stress (WSS). Finally, in Section 7 we draw our conclusions and discuss future perspectives of the current study.
written as A and A, respectively. Global block algebraic vectors are written in capital letters and we use the notation A = vec a 1 , … , a n ∈ ℝ M , M = ∑ i = 1 n m i , to indicate the concatenation of a 1 ∈ ℝ m 1 , … , a n ∈ ℝ m n . On the other hand, the notation A = a 1 | … | a n ∈ ℝ m × n indicates a matrix whose columns are represented by a 1 ∈ ℝ m , … , a n ∈ ℝ m . Superscripts, subscripts and hats.-We use the superscripts h or N to denote quantities related to the FE or the RB approximations. In Section 4, we introduce a modular domain-decomposition method based on reference building blocks and subdomains of the target geometry. Indices of the subdomains are written as simple superscripts (e.g., Ω j ); superscripts in square brackets refer to interfaces (e.g., Γ [ij] = Ω i ∩ Ω j ) or portions of boundaries and related quantities. In the context of the Newton-Raphson algorithm, superscripts in parentheses are used to refer to the iteration index of the iterative method. Subscripts are used to indicate: (i) indices of vectors or components (or blocks) of vectors or matrices, and (ii) quantities at specific timesteps (e.g., u i ℎ is the vector of FE degrees of freedom of velocity at the ith timestep). Whenever a symbol refers to a reference building block or the reference interface (unit disk), it is indicated with a hat notation (e.g., Ω i is the ith reference building block).

The reduced basis method in a nutshell
In this section, we provide a non-comprehensive introduction to the RB method which is intended to set the theoretical basis for the remainder of the paper. For a complete overview, we refer the reader to [26,27].
Let us consider an open and bounded domain Ω and a steady differential problem of the form ℒ(u; μ) = G(μ), (1) where u ∈ V (V being a suitable functional space) is the solution, μ ∈ D ⊂ ℝ N μ is a vector of geometrical and/or physical parameters, ℒ is a generic differential operator, and G is a functional encoding the data of the problem, such as forcing term and boundary conditions. In this section we assume that ℒ is an elliptic operator. The extension of this setting to the Navier-Stokes equations is considered in Section 5. is typically called vector of degrees of freedom. Assuming that the differential operator ℒ can be mathematically described in the weak sense by a bilinear form as ℓ(φ, ψ), for φ ∈ V and ψ ∈ V, we identify the matrix L ℎ (μ) ij = ℓ φ j ℎ , φ i ℎ ; μ ∈ ℝ N ℎ × N ℎ , and similarly G ℎ (μ) i = ∫ Ω G(μ)φ i ℎ ∈ ℝ N ℎ . For example, for the linear differential operator ℒ(u; η) = ηΔu, describing a Poisson equation with parameter η, we have ℓ φ j ℎ , φ i ℎ ; η = ∫ Ω η ∇φ j ℎ ⋅ ∇φ i ℎ . The resulting linear system of is possibly very large and expensive to solve. In multi-query scenarios-i.e., whenever it is required to solve Eq. (2) for multiple values of the parameter μ-it is often crucial to reduce the dimensionality of the system in order to save computational time. One way to achieve this is by employing ROMs, such as the RB method.
The main idea of the RB method is to construct a low dimensional basis for the solution u out of a number of solutions N s (snapshots) of the FOM, which are computed during the socalled offline phase. In the online phase, the reduced solution is obtained as a linear combination of the RB functions; system (2) is cast in the form of a small linear system where the unknowns represent the coefficients of such a linear combination. In the remainder of this section, we address the offline and online phases more in depth.

The offline phase: basis construction
There exist two main strategies for the construction of the reduced basis: greedy algorithms [28,29] and the Proper Orthogonal Decomposition (POD) method. The former lead to a more efficient offline phase, as they allow to minimize the number of snapshots N s to be computed. A major drawback of greedy algorithms is that they are based on an a posteriori estimate of the projection error, which is often difficult to compute in practical applications. For this reason, in this paper we opt for the POD method, which often requires a larger number of snapshots N s but is in turn more general. We refer, e.g., to [30,31] and [32] for applications of POD to parabolic and fluid problems, respectively, and [31] for [30] for a comprehensive study of the properties of POD when applied to the solution of Ordinary Differential Equations (ODEs). In the context of cardiac simulations, this technique has been successfully employed both in fluid (e.g., in [33] to simulate blood flow in patient-specific coronary artery bypass grafts) and structural simulations (e.g., in [34], where POD is used to reduce the space of admissible displacements of the heart muscle). In the POD approach, the reduced basis is usually constructed by singular value decomposition (SVD) [35,36] out of the set of snapshots, which are obtained by sampling N s parameters μ 1 , … , μ N s in D and by solving the corresponding FOM. Formally, we arrange the snapshots in matrix form as as the matrix composed of the first N modes and let us recall that, given a N-dimensional orthonormal basis, arranged in matrix form as W = w 1 | … | w N ∈ ℝ N ℎ × N , the projection of a generic vector x ∈ ℝ N ℎ onto span{w 1 | ⋯ |w N } is given by Π W x = WW T x. Then, the following proposition holds.
Proposition 1.-Let V N = W ∈ ℝ N ℎ × N : W T W = I be the set of all matrices whose columns form a N-dimensional orthonormal basis. Then, We refer to [26] for a proof of Proposition 1. In other words, the columns of V are the Ndimensional basis minimizing the projection error of the snapshots over its column space; moreover, such error is strictly related to the magnitude of the singular values σ N + 1 , … , σ N s . Thus, a common heuristic to choose N is to set it equal to the smallest integer N such that where ε is a user-provided tolerance. The left hand side of Eq. (3) is the relative information content of the POD basis, namely the percentage of energy of the snapshots retained by the first N modes. The size of the reduced basis N selected by following criterion (3) is typically much smaller than the size of the FOM N h , i.e., N ≪ N h . Remark 1.-Given a symmetric positive definite matrix X h which is a norm matrix for ⋅ V in the FE space, i.e., ‖u‖ V = u ℎ T X ℎ u ℎ , it is possible to perform the POD such that the basis U is orthonormal with respect to X h (i.e., U T X h U = I). In order to achieve this, we observe that, since X h is symmetric positive definite, it admits a Cholesky decomposition X h = H T H, H being upper triangular. Matrix U is then found as U = H −1 U, where U is computed by SVD of HS = UΣV T . When constructing the reduced basis for our particular application in Section 5, following this approach allows us to achieve the optimality expressed in Proposition 1 with respect to norms more suited to the specific variables of interest (namely, H 1 norm for the velocity and L 2 norm for the pressure).
where L N (μ) ij = ℓ ζ j ℎ , ζ i ℎ ∈ ℝ N × N and G N (μ) i = ∫ Ω G(μ)ζ i ℎ ∈ ℝ N . The assembly and solution of system (4) correspond to the online phase. The transformation of the reduced vector of degrees of freedom into its FE counterpart is simply performed by By exploiting the expansion ζ j In the most general case, therefore, the assembly of the reduced system is done by constructing the full order matrix and right hand side and by computing their projection onto the RB space. If the problem features an affine decomposition, namely there exist parameter-dependent coefficients θ q L for q = 1, …, Q L and θ q G for q = 1, …, Q G such that a considerable speedup is achieved by precomputing the matrices L q N = V T L q ℎ V and the vectors G q N = V T G q ℎ in the offline phase, and by assembling the reduced elements of system (4) as Unfortunately, in most practical scenarios an affine decomposition of the form (5) is not readily available. In such cases, a common strategy to efficiently perform the assembly of system (4) consists in employing the (discrete) empirical interpolation method (DEIM) [37,38] and its matrix variant MDEIM [39]. In this paper, we do not address the optimization of the assembly of the reduced system, which will be investigated in future works.
Remark 2.-The quality of the RB approximation depends on three factors: the POD tolerance ε, the number of considered snapshots N s , and the choice of sampling space.
Assuming the latter to be appropriate, however, the number of snapshots required to achieve errors of the order of the POD tolerance in the online phase may become too large as the dimension of the parameter space D increases. In other words, in applications were the space of parameters is too rich, it is unfeasible to sample a sufficient number of snapshots, and the online error of the RB method might be considerably greater than the one obtained on the snapshots.

The Navier-Stokes equations
In this section, we first introduce the Navier-Stokes equations in strong and weak formulations (Section 3.1). The latter poses the mathematical foundation for the numerical discretization in space by the FE method as described in Section 3.2, where we also derive the fully-discrete model by considering a generic Backward Differentiation Formulas (BDF) scheme.

Strong and weak formulations
Let us consider the problem of approximating the blood flow in a vessel, mathematically represented by an open and bounded domain Ω ∈ ℝ d . In this paper, we take d = 3, but the discussion is also valid for the case d = 2. We model the blood as an incompressible Newtonian fluid and, therefore, its dynamics is described by the Navier-Stokes equations σ(u, p)n = h onΓ N × (0, T ), u = u 0 for t = 0, (6) where u : Ω × (0, T ) ℝ d and p : Ω × (0, T ) ℝ are velocity and pressure of the fluid, ρ f is the density, μ f is the viscosity, ε(u) = (∇u + ∇u T )/2 is the strain rate tensor, σ(u, p) = 2μ f ε(u) − pI is the Cauchy stress tensor, f : Ω × (0, T ) ℝ d is a forcing term, g : Γ D × (0, T ) ℝ d and h : Γ N × (0, T ) ℝ d are Dirichlet and Neumann data, n is the normal unit vector to the boundary ∂Ω, and u 0 : Ω ℝ d is the prescribed initial condition. Since we deal with cardiovascular applications, we take Γ D = Γ w ∪ ∪ i = 1 N in Γ in Γ in [1] , … , Γ in N in Γ w and Γ N [1] , … , Γ out N out are the inlets, wall and outlets of the vessel, respectively. The inlet velocity profiles and outlet Neumann data are denoted g 1 , … , g N in and h 1 , … , h N out ; on the wall Γ w we consider u = 0. The first equation in Eq. functions v ∈ V 0 and q ∈ Q. The weak formulation of Eq. (6) is obtained by multiplying the momentum and continuity equations by v and q respectively and by integrating over the domain Ω. Hence, we find: (W1) given f, g, and h regular enough, find (u, p) ∈ V g × Q, such that, for every t ∈ (0, T), and such that u = u 0 for t = 0.

Numerical discretization
In order to transform the infinite dimensional problem W1 into a finite dimensional one we consider the two subspaces V g ℎ ⊂ V g and Q ℎ ⊂ Q. Then, we introduce the sets of basis The finite dimensional approximations of velocity and pressure read time, but this is omitted hereon. The definition of V g ℎ and Q ℎ clearly plays a crucial role in the accuracy of the approximation. Moreover, in the case of saddle-point problems such as the Navier-Stokes equations, the quality of the discretization is critical to ensure the wellposedness of the discrete problem; we refer the reader to Section 5.1 and [40,41] for details.
In this section and in Section 4, the basis functions φ i Lagrangian P2-P1 Taylor-Hood FE basis functions [42] (i.e., quadratic and linear piece-wise polynomials for the velocity and the pressure, respectively) obtained from a triangulation of the domain T ℎ composed of tetrahedra. We consider other possibilities-i.e., RB functions -in Section 5.
By introducing the vectors of degrees of freedom u ℎ = u 1 , the discrete version of W1 is conveniently expressed in the form of linear system as We exploit that u ℎ = ∑ i = 1 N u ℎ u i ℎ φ i ℎ and indicate the convective term matrix as C h (u h )-i.e., as a function of the degrees of freedom u h instead of the approximated function u h -in the remainder of the paper.
Let us now introduce a sequence of timesteps t 0 , t 1 , … , t N t such that t 0 = 0, t N t = T , and t k +1 = t k + Δt for every k = 0, …, N t ; Δt is called timestep size. We denote the value of u h and p h at timestep t k by u ℎ t k = u k ℎ and p ℎ t k = p k ℎ , respectively. The numerical discretization in time of Eq. (7) is performed by means of BDF schemes. Specifically, given u k − j + 1 ℎ and p k − j + 1 ℎ for j = 1, …, σ, the numerical solution of the Navier-Stokes equations at timestep t k +1 by a BDF scheme of order σ satisfies where w ℎ = vec u ℎ , p ℎ ∈ ℝ N u ℎ + N p ℎ and H ℎ : = M ℎ , f ∘ℎ t, w ℎ : = f ℎ (t) The coefficients α j j = 1 σ and β depend on the specific BDF scheme. For example, for σ = 1 we have α 1 = β = 1 (Backward Euler scheme), and, for σ = 2, α 1 = 4/3, α 2 = −1/3 and β = 2/3; these two choices lead to numerical methods of first and second order, respectively.
Eq. (8) is in general nonlinear and its solution requires the application of ad-hoc numerical methods; in this paper, we adopt the Newton-Raphson algorithm (see Section 4.3).

Modular domain-decomposition of arteries
With the aim of the model order reduction presented in Section 5, it is beneficial to perform a geometrical approximation of the vessel based on a domain-decomposition approach. We first set the theoretical basis for the approximation of the Navier-Stokes equations on modular geometries in Section 4.1. The discretization of the continuous formulation is then performed in Section 4.2, where we also present our approach for the treatment of the coupling variables (i.e., Lagrange multipliers) at the interfaces.

The continuous Navier-Stokes equations on modular geometries
We introduce a library of building blocks Ω i . In the context of cardiovascular simulations, these reference building blocks are model cylinders and bifurcations, as shown in Fig. 1. The target geometry is then approximated as a modular composition of subdomains Here, Ω j : = Φ z(j) Ω z(j) ; μ j is an open and bounded subdomain obtained by applying a prescribed parametrized geometrical deformation Φ z(j) to the z(j)th building block, z : [1,…, N Ω ] ↦ → [1,…, N bb ] is a map from the indices of the subdomains in the target geometry to the indices of the building blocks, and ℳ : = μ j j = 1 N Ω is the set of geometrical parameters. We remark that, even though the RB method is suitable to handle geometrical and physical parameters, in the remainder of the paper the symbol μ refers exclusively to geometrical ones. The case in which, for instance, the viscosity and density of blood are parameters of the problem will be addressed in future extensions of the present work. In the following, we indicate z(j) := z j for brevity. Each vector of parameters μ j belongs to a space D z j ⊂ ℝ N μ z j whose dimensionality depends on the corresponding reference building block. For each i = 1, …, N bb and given a parameter vector μ ∈ ℝ N μ i , we focus on geometrical deformations of the form where Q(μ) is a rotation matrix, t is a translation vector, and χ i ( ⋅ ; μ) is a nonaffine geometrical deformation. Specifically, the nonaffine deformation χ i ( ⋅ ; μ) is particular to the type of building block corresponding to index i. The types of building blocks we consider and the corresponding admissible nonaffine deformations are depicted in Fig. 2: for the tubes, we consider functions χ i ( ⋅ ; μ) able to vary their length, outlet radius, and bending angle, and for the bifurcations we are interested in rotating the normals of the outlets.
The building blocks in the reference configuration are designed with circular inlet and outlet faces; the geometrical deformations are chosen such that the interfaces are circles for every possible choice of the geometrical parameters.
and Γ [ml] represent the same physical surface, it is still beneficial to differentiate between the two as we associate with each interface the vectors n lm and n ml , i.e., the outward normal unit vectors with respect to Ω l and Ω m (clearly, n lm = −n ml ). This distinction allows to simplify the notation in W2.
For every subdomain Ω j (μ j ), we introduce the set of indices of the neighboring subdomains N(j) and the sets I in (j) and I out (j), which contain the indices of the inlet and outlet interfaces whose intersection with the boundary of Ω j (μ j ) is nonempty. More precisely, . For the sake of conciseness, let us use the following notation for every ε, φ, ω ∈ [H 1 (Ω j )] d and for every ψ, η ∈ L 2 (Ω j ). Assuming for simplicity that Ω = Ω m , it can be shown that W1 is equivalent-in a sense that will specified in Remark 4-to the following weak formulation: (W2) given f, g, and h regular enough and for every j = 1, …, N Ω , find and such that u j = u 0 | Ωj for t = 0. Moreover, for every m ∈ N(j), λ [jm] = −λ [mj] and Remark 4.-The two weak formulations W1 and W2 are equivalent in the following sense. If (u, p) satisfies W1, then (u Ω j , p Ω j , {σ(u, p)n jm } m∈N(j) ) is also solution of the local problem W2, for every j = 1, …, N Ω . The Lagrange multipliers therefore play the role of the stress at the interfaces; for details, see, e.g., [25,43]

Discretization of the primal hybrid formulation of the flow problem
The discretization of differential problems by the FE method requires the definition of a computational mesh, as already mentioned in Section 3.2. In the case of the approximated modular geometry Ω m , each building block Ω i is equipped with its own triangulation T i, ℎ . Therefore, the global mesh T m ; μ j is a composition of distinct meshes which do not necessarily satisfy conformity constraints at the interfaces. We recall that a mesh is conforming if, for every two elements, their intersection is either empty or a face, edge or vertex. In other words, a conforming mesh does not feature any hanging nodes.
The global mesh is in general nonconforming and we are compelled to consider nonconforming domain-decomposition methods for the solution of the Navier-Stokes equations. These are formally defined as domain-decomposition methods in which the search space for the discrete solution is not a subset of the continuous search space (in our case V g × Q). The most popular approaches rely on the introduction of suitable Lagrange multipliers at the interfaces to enforce transmission conditions, as in W2; see, e.g., the wellknown mortar method [44][45][46] and INTERNODES [47,48]. In this paper, we adopt the algorithm presented in [25], which is based on the discretization of the Lagrange multipliers space via a small number of spectral basis functions defined on the interfaces. For our application, this choice is convenient because (i) the method allows us to recover the hconvergence order of the primal discretization method-i.e., the FE method-even when a small number of spectral basis functions is considered, (ii) defining a spectral basis on each interface does not require to project nor to interpolate the traces of FE basis functions from one side to the other, and (iii) as already mentioned, the interfaces are circular in the target configuration, which allows us to employ a set of standard orthonormal basis functions on the two-dimensional disk. In the remainder of this section and in Section 4.2 we recall the basics of this nonconforming method applied to the Navier-Stokes equations; the interested reader is referred to [25] for the details.
We follow the same procedure presented in Section 3.2 for the discretization of the local variables u j and p j , which become u j, ℎ = ∑ i = 1 , the basis functions ξ i, in These functions are used to set the inlet velocity profiles, allowing one to effortlessly transition from the FE model to the reduced one through the process described in Section 5. The use of Lagrange multipliers is a classical way to weakly impose Dirichlet boundary conditions (see, e.g., [49]) and, compared to other popular approaches such as penalty methods [50,51], it has the advantage of being variationally consistent. We remark that (i) we introduce the discretization parameter δ for the Lagrange multipliers to indicate that the degree of refinement is in fact independent of the mesh size in J j, ℎ or J m, ℎ , and (ii) we consider for simplicity the same number N λ of basis functions at each interface.
In this paper, we build the basis functions at the interfaces ξ i δ i = 1 N λ using the scalar functions P k n shown in Fig. 3. These are constructed as follows. Let us consider Chebyshev polynomials of the second kind U n , which are defined through the recurrence relation U 0 (x) Then, for 0 ≤ k ≤ n, P k n (x, y) = 1 π U n (x cos(ωx) + y sin(ωy)), ω = k n + 1 π, are orthonormal polynomials on the unit disk D with respect to the weight function W (x, y) = 1/ π [52]. Given n ≥ 0, we set where e i is the ith canonic vector. It is trivial to find that N λ = d(n + 1)(n + 2)/2.
Let us now address the discretization of the individual elements of W2, which is local to subdomain Ω j , the assembly of the global block system, and the discretization in time.

Discretization of the coupled momentum and continuity equations.-For
In Eq. (12) By introducing the interpolation of the boundary data g m onto the FE space spanned by and F h and G hδ are block vectors accounting for the forcing terms and Dirichlet boundary conditions, respectively. Operator "diag" returns a diagonal matrix where each argument with index j is placed in the jth diagonal block.
Discretization in time and global nonlinear residual.-The discretization in time is performed along the same lines of the discussion in Section 3.2. We define Y hδ = vec(W h ,Λ δ ) and Then, given Y k − j + 1 ℎδ for j = 1, …, σ, the solution at timestep t k+1 is found by solving

Efficient solution of the global nonlinear system
Eq. (15) is nonlinear and hence solved using the Newton-Raphson algorithm. In particular, given an initial guess Y (0) , the (l + 1)th iteration of the algorithm for the solution of R(Y) = 0 is where J R is the tangent matrix of R. The stopping criterion is based on a user-provided tolerance τ NR and reads ∥R(Y (l) )∥ 2 /∥R(Y (0) )∥ 2 < τ NR .
In order to efficiently solve the linear system in Eq. (16) via iterative methods such as GMRES [53], we need to develop a preconditioner for the tangent matrix J R .
Differentiating Eq. (15) with respect to its only argument yields Thus, the tangent matrix features a saddle-point structure stemming directly from the original differential problem W2-which is in fact a saddle-point problem. In the remainder of this section, we omit the explicit dependence of J R and A on Y and W, respectively, for the sake of clarity of notation. A possible strategy to design a preconditioner is based on the (exact) decomposition where S = − ℬA −1 ℬ T is the Schur complement of J R . This decomposition is the foundation of several preconditioners for saddle-point systems, such as SIMPLE [54] and the nested block preconditioner for blood flow simulations proposed in [55]. The solution of a linear system of the form J R X = B, with X = vec(X w , X λ ) and B = vec(B w , B λ ), amounts to solving number is typically small.
Remark 5.-The special structure of the Schur complement in Eq. (20) makes its computation particularly attractive in the context of high-performance computing, assuming that each subdomain be assigned to one or few processors. An effective partition strategy should take into account the number of degrees of freedom associated with each geometrical building block in order to preserve the load balancing among the set of computational nodes. At the same time, the fact that the coupling conditions do not require to explicitly transfer information from one side of the interface to the other can be exploited to limit the amount of processor communications. In this paper, we only consider a serial implementation, but future directions of the current work include investigations on the parallel performance of the preconditioner.
Remark 6.-The computation of the Schur complement S, whose cost scales linearly with the number of columns of ℬ (i.e., the total number of Lagrange multiplier basis functions), is in fact the most expensive step of the application of the preconditioner discussed in this section. In our numerical experiments, however, we observed that significant performance improvements are achieved by reusing the same Schur complement for n consecutive applications of the preconditioner. This strategy does not significantly affect the number of GMRES iterations, if n is small enough (in our case, n ~ 20).
Let us now address the application of the inverses of A and S. Since S ∈ ℝ N Γ N λ × N Γ N λ , the Schur complement is typically small and is inexpensively inverted by solving the linear system either directly or via iterative methods (complemented with standard preconditioners such as multigrid or ILU). Matrix A features a block diagonal structure in which each diagonal block is itself a saddle-point system and inverting A is therefore equivalent to solving linear systems which are local to each subdomain Ω j . Employing, instead of A −1 , a suitable approximation thereof, gives rise to different suitable preconditioners.
In this work, we choose to approximate every block diagonal matrix of A −1 by considering a single application of the SIMPLE preconditioner, both in the computation of the Schur complement shown in Eq. (20) and in the application of the preconditioner. We recall that SIMPLE is based on Eq. (18) applied to the Navier-Stokes equations and that the approximated inverse of the top left diagonal block is performed by extracting and inverting its diagonal [54]. In Fig. 4, we show the robustness of our preconditioner with respect to the number of blocks and to the number of basis functions for the Lagrange multipliers per interface N λ . The considered geometry is that of the aorta and the illiac arteries in Fig. 4 (left). The blocks are sequentially added starting from the inlet (for this reason, we remark that the size of the system increases proportionally with the number of blocks). We compare the preconditioner performance with that achieved by inverting every block in A with GMRES and relatively large tolerances (5 · 10 −1 and 1 · 10 −2 ). We remark that, as the preconditioner in the latter approach varies in each iteration, we are obliged to employ flexible GMRES (FGMRES) [56]. If the local systems are solved exactly, the preconditioner is in fact the original global matrix, as Eq. (18) is an exact decomposition. For this reason, solving the local linear systems with GMRES leads to a better performance in terms of number of iterations. However, approximating each local inverse with SIMPLE is more efficient in terms of solution time, as each FGMRES iteration is less computational expensive. We conclude by observing that the increase in solution time occurring at N Ω = 13 is due to the introduction of the bifurcation-which is composed of a larger number of elements than the other blocks-in the set of considered subdomains.

The reduced basis element method for flow in arteries
As discussed in Section 4, the subdomains in the target geometry Ω m are obtained from the parametrized geometrical deformation of a number of building blocks Ω i , i = 1,…, N bb . In this paper, these are a model symmetric bifurcation (B), and straight tubes with aspect ratios length/diameter 1:1 (T1), 1:2 (T2) and 1:3 (T3). The offline phase of our reduced order model algorithm consists of defining reduced basis functions in each of these building blocks Ω i . The snapshots are collected from a single decomposed "artificial" geometry Ω m = ∪ j = 1 N Ω Ω j by sampling the geometrical parameters μ 1 , …, μ N Ω describing each subdomain from uniform distributions centered on the values characterizing the original configuration, as depicted in Fig. 5. The snapshots are found by solving a flow problem with ρ f = 1.06 g cm −3 , μ f = 4 · 10 −2 g cm −1 s −1 , the imposed inflow flow rate shown in Fig. 5 (in the box on the left) with a parabolic profile and homogeneous Neumann conditions on the outlets on 165 random configurations of the artificial geometry. There exist other equally valid possibilities to generate the database of snapshots. For example, these could be taken by solving flow problems on a collection of target geometries. This approach allows us to avoid issues related to the random sampling of the geometrical parameters-e.g., physiological feasibility of the resulting global geometry-but requires the aid of an automatic algorithm for the decomposition to be efficient. The development of such an algorithm is one of the possible future extensions of the present work. The simulations are run from t 0 = 0 s to T = 0.3 s with a BDF scheme of order σ = 2 and Δt = 2.5 · 10 −3 s. The initial condition at t 0 is computed by gradually increasing the inflow flow rate profile at the inlet by the law Q(t) = Q 0 1 − cos t − t 0 ramp π/ t 0 − t 0 ramp /2, Q 0 being the desired flow rate at time t 0 , from t = t 0 ramp = − 2 ⋅ 10 −2 to t = t 0 . For the discretization of the Lagrange multipliers on each interface, we employ the set of basis functions Ξ n with n = 5, which corresponds to N λ = 63 basis functions. We remark that the artificial geometry is not included in the configurations used for the snapshots generation and is considered in Section 6.1 to assess the performance of the method.
The . We remark that, since we are dealing with unsteady problems, these matrices collect snapshots sampled at different timesteps for different values of the geometrical parameters. It is worth noting that each velocity snapshot, which is divergence free in the deformed configuration, does not retain such property on the reference building block. In order to consider snapshots which are divergence-free in the reference configuration, the columns of S u i are scaled by means of the divergence-preserving Piola transformation, which is defined as follows. Given a vector field v defined in Ω j (μ j ) and such that ∇ Matrix J Φ z j x; μ j is the Jacobian of transformation Φ z j defined in Eq. (9). This explicitly takes the form J Φ z j x; μ j = Q μ j J χ z j x; μ j −1 .
Remark 7.-The nonaffine deformation χ z j ⋅ ; μ j is defined in analytic form for tubes T1, T2 and T3; therefore, for those building blocks the Jacobian is computed exactly. The nonaffine deformation of the bifurcation B is performed by prescribing the position of the outlets in the physical configuration and by solving a linear elasticity problem such that the displacement field operates the desired rotation of such interfaces. Due to the complications of the evaluation of the Jacobian at the mesh nodes, in the bifurcation we consider J χ z j −1 x; μ j ≈ I. This simplification is also justified by the fact that for this building block we restrict ourselves to small deformations.
The reduced basis vector for velocity and pressure in the ith building block, ζ j i, ℎ j = 1 N u i and η j i, ℎ j = 1 N p i , are the columns of matrices

Author Manuscript
Author Manuscript

Author Manuscript
Author Manuscript which are constructed by POD by considering two tolerances ε u and ε p (the same for every building block), as described in Section 2.1. Matrices V u i and V p i are made orthonormal with respect to X u i (matrix discretization of the H 1 norm on the ith reference building block) and X p i (matrix discretization of the L 2 norm on the ith reference building block) respectively, by following the procedure presented in Remark 1. The first four modes of velocity and pressure for each building block are depicted in Fig. 6. We also introduce the local matrices V u j , which are obtained by applying to each column of V u z j the Piola transformation from the reference configuration to the physical one Ω j (being dependent on the geometrical parameters ℳ, these must be computed during the online phase). Table 1 reports data about the N bb = 4 building blocks composing the artificial geometry used for the data generation, such as number of snapshots N s i and the sizes of FE and RB spaces for velocity and pressure. Although the RB sizes are considerably smaller than the FE ones, the number of basis functions needed to achieve low POD tolerances is substantial. This indicates that the amount of information carried by the snapshots impedes the reduction of the problem. The basis size could be decreased by considering narrower sampling intervals for the geometrical parameters describing each building block. However, as in Section 6.2 we test the ability of the same reduced basis to generalize to the case of a geometry which is not considered in the offline phase, here we decide to allow for significant deviations of the configurations from the original geometry during the snapshots generation. It is worth noting that, in order to decrease the already high computational burden of the offline phase, we settle for a number of configurations (165) that is possibly too limited to capture the geometrical variability we consider in the dataset (see Fig. 5 for examples of some the configurations). As we verify in Section 6.1, the errors that we obtain in the online phase are-although sufficiently low for most cardiovascular applications-considerably larger than the POD tolerance as a consequence of Remark 2.

Supremizers enrichment for pressure and coupling Lagrange multipliers
In Section 3.2 we recall that the Navier-Stokes equations represent an example of saddlepoint equations and that this class of problems is associated with stability issues related with the discretization spaces employed for the primal and dual fields (velocity and pressure, respectively). Unfortunately, even if a stable discretization is considered during the reduced basis generation, the stability is in general not preserved in the reduced system. Furthermore, the global system obtained from the nonconforming method introduced in Section 4.2 is also a saddle-point problem where the velocity and the Lagrange multipliers play the role of the primal and dual fields, respectively. Among the ways to deal with the loss of stability in the reduced system are the use of least squares Petrov-Galerkin approaches for the solution of the minimization problem associated to the nonlinear residual of the reduced equations [57,58] and the supremizers enrichment [59][60][61]. Here we follow the latter approach.
The stability condition is often called inf-sup condition [41] and must be satisfied both at the continuous and discrete level in order to ensure the existence and uniqueness of the respective solutions. Let us consider W2, which we assume to be well-posed in the continuous setting. We first address the stability with respect to the constraint imposed by the pressure (divergence free velocity). At the FE level, the inf-sup condition requires the existence of β p j, ℎ ∈ ℝ such that, for all j = 1, …, N Ω , β p j, ℎ = inf q ≠ 0 sup v ≠ 0 q T D j, ℎ v ‖V‖ V j, ℎ ‖q‖ Q j, ℎ > 0, where we used the notation ‖v‖ V j, ℎ = v T X u j, ℎ v = ‖v ℎ ‖ V j and ‖p‖ Q j, ℎ = p T X p j, ℎ p = ‖p ℎ ‖ Q j .
Taylor-Hood elements [42] are an example of stable choice of elements ensuring that β p j, ℎ > 0, as mentioned in Section 3.2. In the RB context, we introduce X u j, N v, the pressure counterparts X p j, N and ‖p N ‖ Q j, N, and The main idea of supremizers enrichment is to augment the reduced basis for the velocity with vectors (the supremizers) specifically computed from the pressure modes to ensure the positivity of the inf-sup constant. Formally, let us consider the supremizers s 1 j , …, s N p z j j which are obtained by solving for j = 1, …, N Ω and for l = 1, …, N p z j the problem X u j, ℎ s l j = D j, ℎ T η l z j . It can be shown [59] that substituting V u j with the enriched basis V u j, + = [V u z j |s 1 j |… | S N p z j j ] leads to β p j, N > β p j, ℎ > 0. A major drawback of this type of (exact) supremizers enrichment is that the supremizers for subdomain Ω j are dependent on the geometrical parameter μ j , which implies that they must be computed in the online phase. For this reason, in this paper we follow an approximate approach similar to the one considered in [59] in the case of geometrical parameters. For every reference building block , ℎδ is the matrix assembled on Ω i discretizing the coupling with the mth interface (specifically, if Ω i is a tube m = 1, 2, whereas if it is a bifurcation m > 2). In the following, we simply denote by V u i the matrix obtained by arranging columnwise ζ l i, ℎ and the supremizers for the pressure and coupling stabilizations s l i and z l i ; the columns of V u i are made orthonormal with respect to X u i with the Gram-Schmidt algorithm. The numerical results in Section 6 are obtained by following this stabilization strategy.

Assembly and solution of the global reduced system
Let us define the global matrix the vector of reduced degrees of freedom for all the subdomains W N = vec W 1, N , …, w N Ω , N , and the vectors encoding the data F N ≔ W T F ℎ and G Nδ := G hδ . Then, the reduced residual at timestep t k+1 is obtained from Eq. (15) and reads and Y Nδ := vec(W N ,Λ δ ).
As discussed in Section 4.3, finding the root of nonlinear equations using the Newton-Raphson algorithm entails the solution of a nonlinear system in the tangent matrix of the corresponding residual. Formally, solving R N (Y) = 0 given an initial guess Y (0) leads to the iterative algorithm which is equivalent to Eq. (16) in the reduced context.
The efficiency of the reduction relies on the fast assembly of the tangent matrix J R N and residual R N . Regarding the former, we consider the following approximation where A lin N ≔ W T A lin W, A lin ≔ diag K j, ℎ D j, ℎ T D j, ℎ j = 1, …, N Ω is the matrix obtained by neglecting the convective terms in the Navier-Stokes equations. We remark that the reduced tangent matrix features a saddle-point structure as its full order counterpart in Eq. (17). Therefore, system J R N X N = B N can be solved directly by applying the reduced version of Eq. (19). The advantages of this approach are: (i) the tangent matrix J R N is never entirely allocated, because every stage for applying Eq. (19) involves operations that are local to either subdomains or interfaces (we recall that inverting A lin amounts to inverting each of its diagonal blocks), (ii) as a result of approximation (21), the tangent matrix is the same for every solution of the linear system; hence the reduced Schur complement is assembled only once and it can be factorized-along with the other local matrices to be inverted-at the start of the simulation. The linearized version of the tangent matrix (21) is nonconsistent, which implies that the Newton-Raphson algorithm is not expected to convergence quadratically. However, the reduced complexity of the assembly results in a considerable performance gain overall.
The problematic part of the computation of R N (Y) is evidently the nonlinear term, as the matrices encoding the linear ones are computed only once and the corresponding contributions are found at each timestep by inexpensive matrix-vector multiplications. After trivial but repetitious steps, we find that the blocks of the nonlinear part of the reduced residual read, for all j = 1, …, N Ω , c j, N = V u z j T C j, ℎ V u z j u j, N V u z j u j, N . (22) One way to compute the nonlinear term for every block is then to assemble the full order The integrals in Eq. (23) are independent of the reduced solution and can be computed as a setup step in the first stages of the simulation. However, the amount of computation increases quadratically with the size of the reduced basis and could therefore nullify the performance gain. Using the fact that the velocity modes in V u z j are sorted in order of significance as a consequence of Proposition 1, it is legitimate to consider the following approximation N c z j u l j, N u m j, N ∫ Ω j ζ m z j , ℎ ⋅ ∇ ζ l z j , ℎ ζ i z j , ℎ , (24) where 0 < N c j ≤ N u z j . In other words, the observation that the magnitude of the reduced coefficients u i j, N quickly decreases as i increases-as we show in Fig. 7 for the case of one of the simulations presented in Section 6.1-allows to truncate the two sums in Eq. (23) to the first N c j terms. In the numerical results in Section 6 we investigate the effects of considering both Eqs. (22) and (24) for the computation of the convective part of the residual.

Numerical results
In this section, we assess the performance of our numerical method in two applications. In Section 6.1, we consider the same modular artificial geometry employed in the offline phase presented in Section 5 and we compare the results obtained by solving the flow problem using the RB and the FE methods. These are obtained with a code based on LifeV, a C++ FE library with support to high-performance computing [62]. In Section 6.2, we consider a simple but more physiological geometry of an aorta and the two iliac arteries. In this case, we compare the results obtained with the RB method against the ones obtained on a reference geometry (i.e., not partitioned into approximated subdomains) with SimVascular 1 [63], an open-source software for patient-specific modeling and blood flow simulations.
For all the simulations, we fix ρ f = 1.06 g cm −3 , μ f = 4 · 10 −2 g cm −1 s −1 , and we consider the same choice for the discrete Lagrange multipliers space as in the snapshot generation phase (i.e., N λ = 63 for each interface).

Online phase on the artificial geometry
We evaluate the performance of our reduced order model on the artificial problem employed for the generation of the reduced basis. We recall that the Artificial geometry in Fig. 5, which is not included in the set of 165 configurations used to produce the snapshots, is a legitimate candidate to test the accuracy on geometries not "seen" in the offline phase. The solution by the RB method is compared to the global solution obtained by considering FE method solutions in each subdomain (with the same meshes used in the RB case) coupled with the discretization strategy presented in Section 4.2. We consider the same choice for the discrete Lagrange multipliers space as in the snapshot generation phase. The reasons for considering such comparison are the following: (i) being that the geometry is exactly the same, it is possible to easily compute H 1 and L 2 errors for velocity and pressure in order to verify the convergence of the RB approximation with respect to the FE one, and (ii) it is possible to fairly discuss the speedup achieved by the RB method, as the RB and FE solutions share the same computational mesh. As for the generation of the reduced basis, we consider t 0 = 0 s, T = 0.3 s, and a second order BDF scheme with Δt = 2.5 · 10 −3 s. Fig. 8 shows, in the first two rows, the magnitude of the velocity field and pressure distribution at times t = 0.15 s and t = 0.25 s obtained with the RB method and the corresponding point-wise errors with respect to the FE solution. The POD tolerances in every subdomain have been set to ε u = 1 · 10 −3 and ε p = 1 · 10 −5 . We observe that, despite the global mesh being nonconforming, the velocity and pressure appear to be quite smooth at the interfaces. The comparison with the FE solution highlights the fact that the largest errors are committed in the region of the bifurcation. This is likely due to the fact that, as shown in Table 1, the reduced basis for the corresponding building block (B) is based on a smaller number of snapshots. However, the RB and the FE solutions match quite accurately overall, as the relative error is negligible in every part of the domain. The last row of Fig. 8 depicts the distribution of the magnitude of the WSS on the boundary of the artery in the RB and the magnitude of the error. The influence of the coupling is noticeable: indeed, it is clearly possible to spot the interfaces as regions with abnormally low or high WSS. However, this effect is not due to the RB approximation but rather to the coupling strategy: indeed, the RB and FE approximations are extremely close, as proven by the small magnitude of the error in the WSS. Fig. 9 shows the H 1 and L 2 relative errors on velocity and pressure integrated in time, defined as highlighting the convergence of the RB solution to the FE one as the reduced basis size increases. Clearly, ε u and ε p both contribute to the errors in velocity and pressure. Indeed, for large ε p , e u and e p set on a plateau as ε u decreases, indicated that the error in the pressure is dominating the error in the velocity. For each data point in Fig. 9, the corresponding speedup is reported in Table 2. The runtime of the reference FE solution-which is composed of 641,502 degrees of freedom for velocity and pressure and 567 degrees of freedom for the Lagrange multipliers-is 66,892 s (~ 18.5 h). The speedups are relative to the total runtime and, in parentheses, to the part of the online phase after the initial setup (which include the loading of the reduced basis, the assembly of the constant matrices and their projection onto the reduced spaces). The motivations to consider both speedups are twofold. Firstly, in this paper we do not focus on the optimization of the assembly part of the system, which could considerably increase the total speedup; such optimization could be carried out, for example, by employing (M)DEIM, as mentioned in Section 2.2. Secondly, the setup part of the RB algorithm is particular to the geometry we are interested in. As a matter of fact, should we be interested in solving flow problems corresponding to different boundary conditions and/or fluid properties but on the same geometry, the setup phase can be executed only once, and for each solution of the reduced system we take advantage of the speedups relative to the only solve phase. The gain in performance is in all cases quite substantial (at least one order of magnitude with respect to the full order solution), and we observe, as expected, the trend of increasing speedup as the size of the reduced system decreases. However, the careful profiling of the simulation highlights that most of the time of the solve phase is spent in the assembly of the reduced convective term rather than in the actual solution of the reduced system. This is because, as discussed in Section 5.2, the exact assembly of the reduced convective terms entails two projections and the construction of the full order convective term.
With the purpose of achieving higher speedups during the solution time, we consider the approximation of the convective term given in Eq. (24). Fig. 10 shows the absolute and relative H 1 and L 2 errors on velocity and pressure over time in function of different degrees of truncation of the convective term (i.e., different values of N c z j , which we set equal to N c z j = N c for every subdomain). The achieved speedups are, from N c = 10 to N c = 120 and using the same notation adopted in Table 2 Fig. 7, which refer to the velocity coefficients of the reference solution corresponding to ε u = 4 · 10 −3 and ε p = 8 · 10 −5 . As expected, the runtime of the solve phase is greatly decreased with respect to both the FE solution, against which the speedup achieved is always higher than 200, and with respect to the RB solution with the convective term computed as in Eq. (22). As N c increases, the total speedup rapidly decreases due to the quadratic dependence on that parameter of the number of integrals computed during the setup phase. Nevertheless, we believe that this strategy for approximating the convective term is of great benefit whenever it is required to run multiple simulations on the same geometry, as in this scenario the setup phase is only performed once.

Online phase on the aorta and iliac arteries
In this section we consider a physiological geometry of an aorta with the two iliac arteries. 2 Our goal is to evaluate the effects of the geometrical approximation on the solution given by our ROM. In order to do so, we employ the geometries depicted in Fig. 11. Specifically, on the left we show the decomposed geometry along with the "exact" one. On the right, we provide a quantitative analysis of the difference between the two. The algorithm to generate the decomposed geometry from the target one is out of the scope of this paper. However, the development of efficient and accurate reconstruction strategies is a topic of interest and will be addressed in future works. Importantly, we choose to employ the same set of RB basis functions computed in the offline phase described in Section 5 (which, we recall, is built upon modifications of the same artificial geometry we use to test the accuracy of the ROM in the previous section). This is motivated by the perspective of employing the method in realistic scenarios which may be considerably different from the ones explored during the offline phase.
The flow problem consists of imposing the same inflow profile shown in Fig. 5 at the inlet (the aorta) and homogeneous Neumann conditions to the outlets (the iliac arteries). We take T = 1.5s (i.e., two heartbeats) and Δt = 1.25 · 10 −3 s. As already anticipated, the reference simulation is computed with the SimVascular solver svSolver. This software is based on the FE method with P1-P1 elements and VMS-SUPG stabilization; we refer to [64] for more information regarding this numerical approach. It is therefore challenging to devise a fair comparison between the ROM-which, we recall, is built upon a P2-P1 discretization-and the reference solution in terms of efficiency and accuracy. Nevertheless, we provide for sake of completeness some data regarding the reference solution. This is computed on a fine mesh (composed of 1,823,827 nodes) which is selected by studying the convergence of the WSS on the boundary; the simulation using SimVascular took 46,457 s (~ 13 h) by employing 28 cores.
In Fig. 12, we show the qualitative comparison of the velocity field magnitude, pressure and WSS distribution on the wall at two different timesteps. The RB solution is obtained with ε u = 1 · 10 −3 and ε p = 1 · 10 −5 . We observe that, despite the differences in the employed geometries and in the underlying numerical discretization, the solutions share similar features. For instance, the pressure distribution is qualitatively almost identical, and the ranges for velocity and WSS magnitude achieved in every region are comparable. It is apparent, however, that most of the error (on the velocity magnitude in particular) is in the vicinity of the bifurcation. This is due to the fact that our choice of geometrical parameters for the corresponding building block does not allow for the reference bifurcation to be deformed into the target one with sufficient accuracy. For example, Fig. 2 shows that we do not take into account the possibility of varying the radii of the outlets, and this reflects in a large geometric error particularly on one of the branches, as depicted in Fig. 11 (bottom branch in the bottom right plot).
A more quantitative analysis of the performance of the ROM with respect to the reference solution is presented in Fig. 13. Here, we show the average of the WSS magnitude over the three regions highlighted in the figure on the left, the pressure at the inlet and the outflow rate at the outlets, for the reference solution and for RB solutions corresponding to different choices of tolerances and truncations for the approximation of the nonlinear term. We chose to focus on a "fine" RB solution (RB1), where we do not apply the truncation of the convective term, and more "coarse" but efficient RB solutions with approximation of the convective term (RB2, RB3, RB4); for details regarding the employed POD tolerances and number of terms in the truncated sum, we refer the reader to the caption of Fig. 13 . In all cases, we achieve speedups larger than one with respect to the reference simulation obtained with SimVascular (although the gain is negligible in the case of RB1) but on a single core instead of 28. From the results presented in Fig. 13, we note that, while the approximation of the pressure and flow rate is extremely precise for all RB settings, the performance on the WSS is more challenging. The curves for the average WSS on the two regions on the iliac arteries (B and C) are quite close to the reference one compared to the average WSS on the bifurcation (A). However, this is likely an effect of the geometric approximation rather than the accuracy of the ROM per se. As a matter of fact, we already noted in Fig. 12 that the largest errors are located in that area. We also remark that the POD tolerance plays a more dramatic role in the quality of the solution than the number of terms retained in the truncated nonlinear term N c . Indeed, the simulations with smallest tolerances (RB1 and RB2) and largest ones (RB3 and RB4) lead to similar results, regardless of the value of N c . Nevertheless, truncating the convective term is beneficial to the efficiency of the ROM.

Conclusions
In this paper, we presented an implementation of the reduced basis element method for the solution of the unsteady 3D Navier-Stokes equations in the context of cardiovascular simulations. We first considered the problem of coupling finite element solutions defined on subdomains obtained from parametrized geometrical deformations of reference building blocks. This was necessary, as the offline phase of our reduced order method requires the generation of snapshots from coupled finite element flow solutions obtained on a variety of geometries. In order to improve the efficiency of the coupled finite element solver, we devised an ad-hoc preconditioner which takes advantage of the saddle-point structure of the discretized linear system. In the following parts of the paper, we formulated the reduced order model by projecting the matrices and variables (velocity and pressure) onto the reduced basis spaces. This procedure is beneficial because it allows us to considerably reduce the number of degrees of freedom (hence, the size of the linear system to be solved in each iteration of the Newton-Raphson algorithm). In the numerical simulations, we demonstrated the capabilities of the method on the same geometry used for the offline phase and on a physiological geometry consisting of an aorta with the two iliac arteries. In the first case, we registered considerable speedups (from 12 to 33 over the total runtime and from 29 to 50 over the sole solve phase) with respect to the full order solution. Considerable gain in performance was also achieved in the second case for some choices of POD tolerances, although the fact that we considered a reference solution obtained with a different solver (i.e., SimVascular) made the comparison in terms of runtime more complex. In both applications we also analyzed the performance in terms of wall-shear stress reconstruction, which is possible in our reduced order method-as opposed, for example, to 1D modelsbecause the 3D nature of the flow problem is preserved.
We believe that the results presented in this work are promising and that our study suggests many possibilities for the future developments of this reduced order method. As we discuss more in depth in the relative sections of this paper, these include: (i) a parallel implementation of the devised saddle-point preconditioner which could exploit the special structure of the linear system (see Remark 5), (ii) an offline strategy based on the solution of physiological flow problems (rather than flow problems defined on artificially deformed geometries), (iii) efficient ways to reduce the complexity of the setup phase and the assembly of the reduced convective term, and (iv) automatic algorithms for the generation of accurate decomposed geometries out of medical images or reference meshes. Regarding this last point, we also think that-due to the geometrical difficulties to map reference bifurcations into target ones (see discussion in Section 6.2)-an interesting follow-up of the current work is the study of a hybrid method in which some parts of the target geometry (e.g., the bifurcations) are modeled by means of the finite element method and others by means of the reduced basis method. This approach would allow us to easily treat cases featuring more complex geometries-for instance, cerebral aneurysms-that are not approximated as trivial deformations of tubes and which are currently too challenging to tackle with a strategy purely based on the reduced basis method. Sketch of the domain-decomposition of a target geometry. Each block in the target geometry is found from the parametrized geometrical deformation of a small number of reference building blocks.  Types of reference building blocks and affine transformations. On the left, tubes: the geometrical parameters are the angle of the outlet normal α-due to the axial symmetry and to the rotation matrix Q in Eq. (9), a single angle is sufficient to represent a bending in any direction-in the deformed configuration α and ratios between the reference and deformed lengths (L/L) and reference radii (R/R). On the right, bifurcation: the geometrical parameters are the angles describing the rotation of the reference outlet normals n 1 and n 2 onto the outlet normals n 1 and n 2 (i.e., three Euler angles per outlet, that is six geometrical parameters in total).      Average over time of the RB velocity solutions-normalized with respect to the first coefficient-in the four building blocks (from left to right: bifurcation B and tubes T1, T2 and T3) in the RB simulation considered in Section 6.1 with ε u = 4 · 10 −3 and ε p = 8 · 10 −5 .
The left and right dashed lines in every plot correspond to the indices 40 and 120, which are two of the choices considered in Section 6.1 for the truncation of the computation of the convective term.   Error in velocity e u (left) and error in pressure e p (right), computed as in Eq. (25), in function of the POD tolerances for velocity and pressure ε u and ε p .   On the left, qualitative comparison of the reference mesh with the decomposed one. On the right, quantitative estimation of the distance between the two.  The left and right columns-each composed of two sub-columns of plots-refer to time t = 0.9 s and t = 1.25 s, respectively. First row: velocity magnitude volume plot of the RB and reference solutions (sub-column left and sub-column right, respectively). Second row: pressure plot of the RB and reference solutions (sub-column left and sub-column right, respectively). Third row: magnitude of the WSS of the RB solution and reference solution (sub-column left and sub-column right, respectively). The RB solution corresponds to the choice ε u = 1 · 10−3 and ε p = 1 · 10 −5 .   Table 2 Overall speedups w.r.t. the FE solution and, in parenthesis, speedups of the solve part of the online phase, i.e., speedup relative to the total running time excluding the setup part in which the reduced bases are loaded and the constant matrices are assembled and projected onto the reduced spaces.