Preconditioning Navier–Stokes control using multilevel sequentially semiseparable matrix computations

In this article, we study preconditioning techniques for the control of the Navier–Stokes equation, where the control only acts on a few parts of the domain. Optimization, discretization, and linearization of the control problem results in a generalized linear saddle‐point system. The Schur complement for the generalized saddle‐point system is very difficult or even impossible to approximate, which prohibits satisfactory performance of the standard block preconditioners. We apply the multilevel sequentially semiseparable (MSSS) preconditioner to the underlying system. Compared with standard block preconditioning techniques, the MSSS preconditioner computes an approximate factorization of the global generalized saddle‐point matrix up to a prescribed accuracy in linear computational complexity. This in turn gives parameter independent convergence for MSSS preconditioned Krylov solvers. We use a simplified wind farm control example to illustrate the performance of the MSSS preconditioner. We also compare the performance of the MSSS preconditioner with the performance of the state‐of‐the‐art preconditioning techniques. Our results show the superiority of the MSSS preconditioning techniques to standard block preconditioning techniques for the control of the Navier–Stokes equation.

system states which are described by the PDEs, 2,7 where the optimal shape design, 8 boundary control of fluid dynamics 9,10 are applications for this type of optimization approaches. For the simultaneous approach, its high potential in terms of efficiency and robustness turns out to be very difficult to be realized when the sizes of the states and inputs are both very large. This yields a very large-scale optimization problem, where the equality constraints by PDEs are appended to the augmented cost function. The Lagrangian multipliers, the number of which is the same as the state variables of PDEs, make the size of the optimization problem even bigger.
Just like many problems in science and engineering, the most time-consuming part for the optimal control of PDEs is to solve a series of linear systems arising from the PDE discretization. 3 With increasing improvement of computational resources and the advancement of numerical techniques, larger problems can be taken into consideration. 11 An important building block for the optimal control of PDEs is the preconditioning techniques to accelerate the simulation of PDEs. Many efforts have been dedicated to the development of efficient and robust preconditioning techniques for the distributed control problems where the control is distributed throughout the domain, [12][13][14][15][16][17][18][19][20][21] or the boundary control problems where only boundary conditions can be regulated. [22][23][24] For the in-domain control problems where the control only acts on a few parts of the domain, preconditioning techniques developed for the distributed control problems do not give satisfactory performance. This is because the developed preconditioning techniques highly depend on an efficient approximation of the Schur complement of the block linear system arising from the discretized optimality condition. 20,25 Research on in-domain control problems in engineering focuses on specific objective 1,22,26 without considering efficient preconditioning techniques.
In this article, we focus on efficient and robust preconditioning techniques for the in-domain control of the Navier-Stokes equation, which is to the best of our knowledge the first devoted effort. We use a simplified wind farm control example to illustrate the performance of our preconditioning technique, that is, the so-called multilevel sequentially semiseparable (MSSS) preconditioner. The MSSS preconditioner is proposed for PDE-constrained optimization problems 27 and later studied for some benchmark problems of computational fluid dynamics (CFD). 28 Applying the implicit function theorem (IFT) at the optimization step, we can reuse the preconditioners for the linearized Navier-Stokes equation to solve the forward sensitivity equations for the computations of the gradient and Hessian matrix. This reduces the computational cost significantly. We then study the MSSS preconditioner for the resulting generalized saddle-point system from optimization. In contrast to the standard block preconditioners that require to approximate the Schur complement of the block linear system, the MSSS preconditioner computes an approximate factorization of the global system matrix up to a prescribed accuracy in linear computational complexity. This is a big advantage over the standard block preconditioners. We evaluate the performance of the MSSS preconditioner using incompressible flow and fast iterative solver software (IFISS) 29,301 and compare with the state-of-the-art preconditioning techniques.
The advantage of MSSS matrix computations is their simplicity and low computational cost, which is (r 3 k N). Here, N is the matrix size, r k is the rank of the off-diagonal blocks which is usually much smaller than N. 31,32 Related structured matrices include hierarchical matrices (-matrix), 33  2 -matrices, 34,35 hierarchically semiseparable (HSS) matrices, 36,37 with computational complexity (Nlog N) for some moderate . -matrices originate from approximating the kernel of integral equations and have been extended to elliptic PDEs. 38,39  2 -matrices and HSS matrices are specific subsets of -matrices and HSS matrices have been successfully applied in multifrontal solvers. 37 Efforts have been devoted to preconditioning symmetric positive definite systems by exploiting the HSS matrix structure 40 and unsymmetric systems by -matrix computations. 39,41 To have a comprehensive summary of different hierarchical structures and techniques, as well as their applicability and limitations, we refer to. 42 For interested readers to gain more insight on the latest development in hierarchical matrices by leveraging machine learning techniques, we recommend. 43 To keep a clear structure of this article, we only focus on the MSSS preconditioning techniques. For related work on exploiting the low-rank property of the subblocks in the hierarchical factorization of the system matrix for preconditioning or direct solution, we refer to. [44][45][46][47][48] The structure of this article is organized as follows. In Section 2, we use a simple wind farm control example to formulate an in-domain Navier-Stokes control problem. Applying numerical approaches to solve this optimization problem, we obtain a generalized saddle-point system in Section 3. To preconditioning this generalized saddle-point system, we study the MSSS preconditioning technique in Section 4. In Section 5, we perform numerical experiments to illustrate 1 IFISS is a computational laboratory for experimenting with state-of-the-art preconditioned iterative solvers for the discrete linear equation systems that arise in incompressible flow modeling the performance of the MSSS preconditioning techniques for this type of problems. We also study the state-of-the-art preconditioning techniques in Section 5 as a comparison. Conclusions are drawn in Section 6.

PROBLEM FORMULATION
In this section, we use wind farm control as an example to formulate the in-domain Navier-Stokes control problem. Many research activities illustrate that operating all the turbines in a wind farm at their own optimal state gives suboptimal performance of the overall wind farm. 49 Wind farm control aims to optimize the total power captured from the wind by taking coordinating control strategies to the turbines in the wind farm. By appropriately choosing the computational domain for the flow, the wind farm control can be formulated as an optimal flow control problem where the dynamics of the flow are described by the incompressible Navier-Stokes equation. For the wind farm control, the control only acts on a few parts of the domain where the turbines are located. This in turn gives an in-domain flow control problem. In the next part, we aim to build a simple wind farm model and use this model to formulate the in-domain Navier-Stokes control problem.

Fluid dynamics
Some efforts have been devoted to develop a suitable model to simulate the wake effect in the wind farm, cf. References 50,51 for a general survey and 49 for recent developments. In general there exist two approaches for modeling of the wake. One is the heuristic approach that does not solve a flow equation but uses some rules of thumb, 49 a typical example is the Park model. 52 The second approach is solving an incompressible Navier-Stokes or Euler equation, cf. References 11. In this article, we use the second approach to model the flow in the wind farm. Moreover, some recent research try to take the boundary layer and some physical behavior into account. This in turn requires a more complicated turbulent flow model for the wind farm simulation study. 11,51,53 However, these research activities do not take efficient preconditioning techniques into account but focus on physical properties of the flow for the wind farm simulation. In this article, we focus on designing efficient and robust preconditioning techniques and we evaluate the performance of our preconditioning techniques by IFISS. We only consider flow problems that can be addressed within the framework of IFISS. Consider the stationary incompressible Navier-Stokes equation in Ω ∈ R 2 that is given by where is the kinematic viscosity, u is the velocity field, p denotes the pressure, and f is a source term. Here Ω is a bounded domain with its boundary given by Γ = Ω = Ω D ∪ Ω N , where Ω D denotes the boundary where Dirichlet boundary conditions are prescribed and Ω N represents the boundary where Neumann boundary conditions are imposed. The Reynolds number Re ∈ R + describes the ratio of the inertial and viscous forces within the fluid 54 and is defined by where u r ∈ R + is the reference velocity, L r ∈ R + is the reference distance that the flow travels. The Reynolds number plays an important role in the flow equation that describes whether the flow under consideration is laminar or turbulent. In many engineering problems, turbulent flow happens when the Reynolds number Re > 2000. 54 In the case of flow through a straight pipe with a circular cross-section, at a Reynolds number below a critical value of approximately 2040, fluid motion will ultimately be laminar, whereas at larger Reynolds numbers, the flow can be turbulent. 55 Since we focus on efficient preconditioning techniques for the in-domain flow control using IFISS in this article and no turbulent flow model is included in IFISS, we consider a flow with Reynolds number Re = 2000, although this does not correspond to practical flow for wind farm control.
To study the aerodynamics of the wind farm, we cannot set an infinite dimensional computational domain. We can prescribe suitable boundary conditions for the flow that in turn gives a finite domain. We set a standard reference domain Ω = [−1, 1] × [−1, 1] for the wind farm simulation study. The reference velocity u r is set to 1. The turbines in the wind farm are located in a line that follows the stream direction, and the leftmost turbine is placed at the center of the domain. Such configurations are widely used in the wind farm simulation studies. 11,52 Here the diameter of the turbine is set to be 1 64 of the reference computational domain. The distance between turbines is set to be five times of the diameter of the turbines. 56 Constant inflow is imposed on the left boundary and is given by where n is the unit normal vector of the boundary that points outwards. For top, right, and bottom boundary that are far away from the turbines, the flow is considered to be free stream and the zero stress boundary condition for outflow given by (3) is prescribed, Here, u n is the Gâteaux derivative at Ω N in the direction of n. This boundary condition states that the flow can move freely across the boundary by resolving the Navier-Stokes equation (1).
Associated with the prescribed boundary condition (2)-(3), the resolved velocity without outer source for the Navier-Stokes equation (1) is shown in Figure 1.

Wind turbines
Wind turbines can be modeled as outer sources that interact with the flow. Two widely used methods for wind turbine modeling are the actuator disk method (ADM) 57 and the actuator line method. 58 In this article, we use the ADM method for wind turbines, where the thrust force is denoted by, Here û d is the average axial flow velocity at the turbine disk, C T is the disk based thrust coefficient, is the density of the flow, and A is the area that is swept by the turbine blades.

Objective function
The wind farm control aims to maximize the total power captured by all the wind turbines in the wind farm, which can be represented as where N t is the number of turbines in the wind farm and û j is the uniform disk averaged axial flow speed of the jth turbine.
To summarize, the in-domain Navier-Stokes control problem can be formulated as the following optimization problem, Here C T j varies from 0 to 1 based on individual pitch control and yaw control of the turbine j, f(C T , u) is a nonlinear function and it is of value (4) at the position where turbines are placed and 0 elsewhere, and

REDUCED NONLINEAR PROGRAMMING
In the previous section, we formulated an in-domain control problem (5) by using a wind farm control example. The size of the control variable N t , which is the number of turbines, is much smaller than the size of the state variables (number of velocity and pressure grid unknowns). Therefore, we use the sequential approach that is based on the implicit function theorem to solve a reduced optimization problem.

Reduced optimization problem
Denote the equality constraints in (5) for the flow equation by h(C T , ) = 0 where = (u, p), and the objective function by g(C T , ), then the optimization problem (5) can be written as The equality constraint in (5) implies that the velocity and pressure is a function of f. For the existence of this function, cf. References 7 for a proof. Since f is an explicit function of C T , is a function of C T and denote this function by = s(C T ). The function s(C T ) is not explicitly computed but is obtained implicitly by solving the flow equation (1) using given C T .
By using the implicit function theorem, we can rewrite the optimization problem in a reduced form, Newton-type methods, which are second-order methods, are well-suited to solve this type of nonlinear programming (NLP) problems. An alternative approach to solve this type of problem is the (reduced) sequentially quadratic programming ((R)SQP). 59 For the reduced optimization problem (7), these two types of methods are quite similar and we refer to 7 for a detailed discussion.
In this section, we apply Newton's method to solve the reduced NLP problem (7). The reason to choose the Newton's method is that the Hessian matrix for this problem is of small size and can be computed explicitly. Moreover, we can reuse the information from the last Newton step of solving the nonlinear flow equation to compute the gradient and the Hessian matrix, which makes Newton's method computationally competitive for this optimization problem. This will be explained in the following part. The Newton's method for this problem is described by Algorithm 1. Algorithm 1. Reduced Newton's algorithm for (6) 1:

Input:
Initial guess C (0) T , maximal optimization steps it max , stop tolerance 0 , k = 0 2: while ‖ g k ‖> 0 ςς k ≤ it max do ⊳ optimization loop 3: solve h(C (k) T , ) = 0 to compute (k) ⊳ cf. Section 3.2 4: compute the gradient g k at (C (k) T , (k) ) ⊳ cf. Section 3.3 5: compute the Hessian matrix H k at (C (k) T , (k) ) ⊳ cf. Section 3.3 6: compute the update ΔC (k) Check inequality constraints by projection 9: if C T j >1 then 10: 11: else if C T j <0 then 12: C T j = 0 ⊳ project on boundary 13: end if 14: k ← k + 1 15: end while 16: Output: Optimal control variable C T and corresponding solution of u In Algorithm 1, we denote the optimization step as the outer iteration, and at each outer iteration, we need to solve a Navier-Stokes equation with nonlinear right-hand side. This step is denoted by the inner iteration in line 3 of the algorithm. From the description of Algorithm 1, it is clear that the most time-consuming part for this optimization problem is the solution of the nonlinear flow equation and the computations of the gradient and Hessian matrix. Therefore, efficient numerical methods need to be developed.
We should emphasize that Algorithm 1 has the advantage when the dimension of the control is much smaller than the dimension of the system states. For such cases, the work to solve the forward sensitivity equation to compute the gradient and the Hessian matrix at lines 4 and 5 of the Algorithm 1 is not big since the system matrices for such linear systems in lines 4 and 5 are the same with the system matrix of the last nonlinear step to solve the Navier-Stokes equation in line 3. Once we have efficient preconditioning technique to solve the generalized saddle-point system resulting from the Navier-Stokes equation with nonlinear right-hand side in line 3, we can also compute the gradient and Hessian matrix easily in lines 4 and 5 of Algorithm 1. For the case when the dimension of the control variables is comparable with the dimension of the systems states, we refer to 7,60 for the adjoint equation approach.
We would like to point out here that for the time-dependent in-domain control problems, our preconditioning technique also fits for such cases when we apply the widely used adjoint equation approach to compute the gradient and the Hessian matrix. For the adjoint equation approach, we need to solve the Navier-Stokes equation forward in time and the adjoint equation backward in time. At each time step to compute the solution of the forward and the adjoint PDE, we need to solve a linear equation with generalized saddle-point system structure. The MSSS preconditioning technique fits well for such a framework.

Computations of the flow equation
At each outer iteration of Algorithm 1, we need to solve a nonlinear flow equation at line 3 of Algorithm 1. We explicitly write this equation in decomposed form as Equation (8) is a nonlinear equation where the nonlinearity is caused by both the velocity convection operator and the nonlinear right-hand side. To solve such a nonlinear equation (8), we apply the Newton's method. At each Newton iteration of step 3 in Algorithm 1, we need to solve a linear system of the following type, after discretizing the nonlinear partial differential equation (8) using mixed finite element method. Here N is the convection matrix, J n (⋅) denote the Jacobian matrices from the nonlinear velocity convection term, and J f (⋅) represent the Jacobian matrices from the nonlinear right-hand side. Since only a small part of the domain is controlled, f x and f y are zero almost everywhere in the domain except in the vicinity where the turbines are placed. This in turn gives singular Jacobian matrices J f (⋅) . Comparing system (9) with the standard linearized Navier-Stokes equation by the Newton's method given by we see that the linearized flow equation (9) is a perturbed linearized Navier-Stokes equation with singular perturbation in the matrix blocks that correspond to the velocity. Rewrite Equation (10) in a compact form as and Equation (9) is given by a perturbed form . (12) The linear system (12) is large, sparse, and highly indefinite. It belongs to the type of generalized saddle-point system. 61 Efficient preconditioning techniques are needed to solve such systems using Krylov methods.
The standard approach for preconditioning (12) are given by the following block preconditioners, It is shown in Reference 61 that the preconditioned spectrum by using P 1 has three distinct eigenvalues and one eigenvalue using P 2 with minimal polynomial of degree 2. Therefore, Krylov methods such as GMRES or IDR(s) compute the solution in at most three steps by using P 1 and two steps with P 2 . The key in applying the block preconditioners P 1 or P 2 lies in an efficient approximation of the Schur complement S ≈ S and where is a constant that is independent of the mesh sizes and Reynolds number. In the past decades, efforts for efficient approximation of the Schur complement of the Navier-Stokes equation, that is,Ŝ ≈ S ns = −BA −1 B T have been made, which have resulted in the SIMPLE method, 62 the augmented Lagrangian preconditioner, 63 the pressure convection-diffusion (PCD) preconditioner 64 among others. However, the aforementioned preconditioners do not perform well to solve the perturbed linear system (12) due to the presence of the perturbed term J f . The matrix J f comes from the discretization and linearization of the nonlinear source term f given by (4) and is singular since f only acts on part of the computational domain. In general, the singular matrix J f does not have a low-rank factorized form, which makes the Schur complement even more difficult to approximate. Numerical experiments in Section 5 illustrate the performance of block preconditioner P 2 for (12) where we approximate the Schur complement As we will see in Section 4, all the blocks of the matrix in (9) have an MSSS structure, it is therefore natural to permute the matrix with MSSS blocks into a global MSSS matrix. With this permutation, we can compute an approximate factorization. This factorization gives a global MSSS preconditioner for the system (9). This global preconditioner does not require to approximate the Schur complement of the generalized saddle-point system (12). Details of this preconditioning technique will be introduced in Section 4.

Computations of partial derivatives
Denote the reduced gradient of the cost function in (7) by The gradient can be computed using Since the cost function g is an analytic function of C T , u x , and u y , the partial derivatives Assume that u x , u y , and p are twice differentiable with respect to C T j (j = 1, 2, … , N t ), and that the first-and second-order derivatives have continuous second-order derivative in Ω, that is, for (i, j = 1, 2, … , N t ). Then we can compute the first-order derivative by solving Here the partial differential equation (13) is obtained by computing the first-order derivative with respect to C T j (j = 1, 2, … , N t ) for the flow equation (8) and making use of the linearity of the diffusion operator and the divergence operator.
Note that Equation (13) is a linear partial differential equation. It is often called the forward sensitivity equation of (8). 65 Homogeneous Dirichlet boundary conditions are imposed on the constant inflow boundary and natural boundary conditions are prescribed for the outflow boundary for the sensitivity equation (13). We use a mixed-finite element method to discretize (13) and use the following notations We reuse the same notation to denote its discrete analog to get a linear system given by where N l is the convection matrix, J are the associating Jacobian matrices that linearize the nonlinear right-hand side f with respect to C T j . It is easy to verify that the matrix  in (14) is just the same system matrix from the last Newton step of the linearized flow equation (9). The right-hand side vector j is obtained by discretizing known variables in (13). Therefore, the computed preconditioner for the last Newton step to solve the flow equation can be reused, and the computational work is much reduced.
By stacking j , and j (j = 1, 2, … , N t ) as , and = we obtain the following linear equations with multiple left-hand sides and multiple right-hand sides Here the size of unknowns N t , is much smaller than the problem size. Block Krylov subspace methods, such as block IDR(s), 66,67 are well-suited to solve all the unknowns simultaneously for this type of problems.
Next, we can use similar approaches as above for gradient computations to compute the Hessian matrix H, which is given by According to second equation in section 3.3, we have Since g is an analytic function of C T , u x , u y and we have already computed the first-order derivative Equations (17) and (18) are again linear partial differential equations and they are the same equation with the sensitivity equation (13) but with different right-hand sides. Boundary conditions for (17)-(18) are set by prescribing homogeneous Dirichlet boundary conditions on the constant inflow boundary and natural boundary conditions on the outflow boundary. By using mixed finite element method to discretize the equations (17)-(18), we have Here we reuse C T j C T k p to represent their discrete analog, respectively, and jk = [  (14). The Jacobian matrices J f ′ (⋅) is obtained by computing the second-order derivatives of f x , f y with respect to C T and using partial differentiation rules. Since Equations (17)- (18) are the same with the sensitivity equation (13) but only with different right-hand sides, the linear system (19) has the same system matrix as the linear system (14), that is,  ′ = . Moreover,  is just the system matrix from the last Newton step to solve the nonlinear flow equation (8). The preconditioners computed for the last Newton step to solve the Navier-Stokes equation can also be reused here.
The right-hand side vector jk is obtained by discretizing known variables in (17)- (18). By stacking the unknowns in the following way, We get the following linear system with multiple left-hand and right-hand sides, Here the size of unknowns N 2 t is also much more smaller than the problem size, block Krylov methods are still well-suited to this type of problems.
Note that to compute the Hessian matrix, we need to compute the second-order derivatives by solving the linear equation (20), while the unknowns are matrices of N x × N 2 t . Here N x is the size of spatial dimension, and N t is the total number of variables to be optimized. This can be efficiently computed by using block Krylov subspace methods when N t ≪ N x . For the other cases, quasi-Newton methods such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm 59,60 can be applied without computing the real Hessian matrix but just to compute an approximation.

MULTILEVEL SEQUENTIALLY SEMISEPARABLE PRECONDITIONERS
The multilevel sequentially semiseparable (MSSS) preconditioning technique was first studied for PDE-constrained optimization problems in Reference 27 and some benchmark problems of computational fluid dynamics problems, 28 and later extended to computational geoscience problems. 68 The global MSSS preconditioner computes an approximate factorization of the global (generalized) saddle-point matrix up to a prescribed accuracy in linear computational complexity using MSSS matrix computations. To start with, we first introduce the sequentially semiseparable matrices and the block matrix definition for such matrices is given by Definition 1.
Definition 1 (31). Let A be a p × p block matrix with SSS structure such that A can be written in the following block-partitioned form The matrices U i , W i , V i , D i , P i , R i , Q i are matrices whose sizes are compatible for matrix-matrix product when their sizes are not mentioned. They are called generators for an SSS matrix.
Take a 4 × 4 SSS matrix for example, its generators representation is given by Basic operations such as addition, multiplication, and inversion are closed under the SSS matrix structure and can be performed in linear computational complexity as long as the off-diagonal rank which is defined by the maximum sizes of {W i } and {R i } is much smaller compared with the matrix size. The SSS matrix structure can be derived from 1D interconnected systems. 69 While considering the grid points as interconnected systems, an SSS structured can be inferred from the discretization of 1D PDEs. Multilevel sequentially semiseparable matrices generalize the SSS matrices to the multidimensional case. Similar to Definition 1 for SSS matrices, the generators representation for MSSS matrices, specifically for k-level SSS matrices, is defined in Definition 2.
Definition 2 (27). The matrix A is said to be a k-level SSS matrix if it has a form like (21) and all its generators are (k − 1)-level SSS matrices. The one-level SSS matrix is the SSS matrix that satisfies Definition 1.
Just like SSS structure for 1D discretized PDEs, we obtain an MSSS structure for high-dimensional PDEs when we lump the grid points that are one dimension lower. Take 2D PDEs for example, first we lump the grid points in one dimension and consider the lumped grid points as a single grid point, we obtain a 1D interconnected system again. This in turn gives us a two-level MSSS matrix for the discretized 2D PDEs. Similarly, we obtain a three-level MSSS matrix for 3D PDEs with the number of blocks equals the number of the grid points in the corresponding dimension.
Take the 2D Navier-Stokes equation on a square domain for example. We apply mixed finite element methods such as the Q 1 − P 0 method with uniform grid given by Figure 2 to discretize the Navier-Stokes equation. Here, the circles represent the grid points for velocity unknowns, while the stars are the grid points for pressure unknowns. We use the matrix K in (9) as an example to show the two-level MSSS structure of the matrix blocks in (9). The matrix K has a block tridiagonal form which is a specific MSSS structure (W i = 0, R i = 0), therefore, can be represented by a two-level MSSS matrix. This is shown by Figure 3. For higher order FEM discretization, the MSSS structure can also be obtained, we refer to Reference 68 for more details.
Following the relations between interconnected systems and MSSS matrices, it can be verified that all matrix blocks in (9) have a block tridiagonal structure for a uniform mesh discretization. Therefore, we can apply the MSSS permutation operations introduced in Reference 27 to permute the block system (9) with MSSS blocks into a global MSSS matrix (block tridiagonal) form given byĀ Here all the matrix blocks A ij are MSSS matrices again and are one-level lower than the matrixĀ. For 2D problems, A is a two-level MSSS matrix and all A i, j are one-level MSSS, that is, SSS matrices whileĀ is a three-level MSSS matrix and all A i, j are two-level MSSS matrices for 3D problems. For details of the MSSS permutations, we refer to lemma 3.5 in Reference 27. After MSSS permutation, we can compute an block LDU factorizationĀ = LDU where Here S i is the Schur complement at the ith factorization step and satisfies the recursion given by Note that the Schur complement S i is a dense matrix and one dimension lower than the original problem. Under the MSSS framework, we apply MSSS matrix computations to approximate the Schur complement S i , which yields an MSSS preconditioner. Related work in applying rank structured matrix computations for the Schur complement approximation in the block triangular factorization are studied in References 38,70. Compared with block preconditioners for the generalized saddle-point system (9) that need an efficient approximation of the Schur complement S = −B(A + J ) −1 B T , the permutation operation is a highlight for MSSS preconditioners, which enables us to compute a global factorization of the system matrix without approximation of S.
The block LDU factorization can be computed in linear computational complexity, that is, (r 3 N s ), where r is the off-diagonal rank of the Schur complement S i and bounded by a small constant, N s is the problem/matrix size. This is achieved by performing the model order reduction operations for the MSSS matrices. In this article, we focus on 2D problems where the Schur complement is 1D and can be approximated by one-level MSSS matrix computations. For the application of MSSS preconditioners for 3D problems, we refer to References 27,68. Moreover, we have the following theorem that gives the bound of the MSSS preconditioned spectrum for 2D problems.

Theorem 1. For a nonsingular two-level MSSS matrixĀ, we can compute an approximate block factorization given bỹ A =LDŨ withL,D,Ũ of the form in (22) that satisfies
where 0 is the smallest singular value ofĀ, I denotes the identity matrix, and satisfies Here p is the number of blocks in the one-level SSS blocks ofÃ, which is smaller than the number of the Schur complements, that is, N, and is a freely chosen parameter to compute the MSSS preconditioner.
Proof. For the proof of this theorem, we refer to lemma 3.2, theorem 3.6, and theorem 5.3 of Reference 71. ▪ Theorem 1 states that the MSSS preconditioner can be computed up to arbitrary accuracy, this in turn makes the MSSS preconditioned spectrum cluster around (1, 0) and the radius of the circle that contains the preconditioned spectrum can be made arbitrarily small. This results in a robust preconditioner and makes the convergence of the MSSS preconditioned Krylov solver independent of the mesh size and Reynolds number for the generalized saddle-point system (12).
In practice, is usually of the same order as , which shows that the bound given by Theorem 1 is pessimistic in practice. Choosing between 10 −2 and 10 −4 usually gives satisfactory performance for MSSS preconditioned Krylov solvers. This in turn yields a small r that is bounded by a constant independent of N s and r ≪ N s , which means that the MSSS preconditioner can be computed in linear computational complexity. Numerical results in Section 5 verify this statement.
To analyze the storage costs of MSSS preconditioners, we take an N × N grid for example. Under the MSSS framework, we approximate the off-diagonal blocks of the dense Schur complements with low-rank patterns. To simplify the analysis, we assume that all p diagonal blocks of the SSS matrix are of uniform size and the off-diagonal rank is bounded by r ≪ N. The storage costs for each Schur complements is Theoretically, we set the number of blocks p = N for smaller computational complexity and storage costs while in practice the sizes of the diagonal blocks are often set to a small number, say 8 for making use of level 3 BLAS implementations, that is, p = N 8 ∼ (N). This in turn gives (N) storage costs for each Schur complement and (N 2 ) storage costs for all the Schur complements for MSSS preconditioners, which is linear with respect to the problem size N 2 . This is advantageous over storing all full dense S i that results in (N 3 ) storage costs and (N 4 ) computational complexity which is square with respect to the problem size compared with linear computational complexity for MSSS preconditioners. For the storage costs of MSSS preconditioners for 3D problems, we refer to section 4.4 of Reference 68.

NUMERICAL EXPERIMENTS
In the previous sections, we formulate an in-domain control problem. It is shown that this in-domain control problem can be solved by the reduced Newton method described by Algorithm 1. The biggest computational issue is to solve a nonlinear flow equation (8) by using the Newton's method and two linear equations (14)- (19) with generalized saddle-point structure to compute the gradient and the Hessian matrix. At each optimization step, we solve the nonlinear flow equation (8) with inner iterations. A preconditioned Krylov solver is performed at each inner iteration.
In Section 3, we showed that the linearized flow equation (9) is a perturbed linearized Navier-Stokes equation (10) with singular perturbation on the (1, 1) block of the linearized Navier-Stokes system matrix. Standard preconditioning techniques for the Navier-Stokes equation, which highly depend on efficient approximation of the Schur complement, fail to give satisfactory performance for this problem due to this perturbation. This will be illustrated by numerical results later. In this section, we evaluate the performance of the MSSS preconditioning techniques for the in-domain control problem and compare with the pressure convection diffusion (PCD) preconditioner 72 that is implemented in IFISS. All the numerical experiments are implemented in MATLAB 2015a on a desktop of Intel Core i5 CPU of 3.10 GHz and 16 GB memory with the Debian GNU/Linux 8.0 system.

Preconditioning techniques
In this part, we report the performance of the MSSS preconditioner and the PCD preconditioner for the second Newton iteration in solving the Navier-Stokes equation with external source terms at the first optimization step of Algorithm 1.
We use the IDR(s) method 73 to solve the preconditioned system. The preconditioned IDR(s) solver is stopped if the 2-norm of the residual at step k, which is denoted by ||r k || 2 , satisfies ||r k || 2 ≤ 10 −4 ||r 0 || 2 .
To compute an MSSS preconditioner for a linear system of the generalized saddle-point-type (9), we first convert all the matrix blocks to the MSSS format, which is illustrated by Figure 3. The cost for this part is little since no computations are performed and only memory allocations are needed. Second, we permute the matrix with MSSS blocks into a global MSSS matrix. In this step, all the operations are performed to concatenate generators of MSSS matrices and no computations are needed. The final step is to compute an approximate LDU factorization of the global matrix using MSSS matrix computations. This is the most computational intensive part for the MSSS preconditioners and the computational complexity is linear with respect to the problem size as we have analyzed in Section 4, which is also illustrated by numerical results in this section.
We also compare the performance of MSSS preconditioners with that of the PCD preconditioner  p for the linear system (12) given by, where S p = L p A −1 p M p is the approximation of the Schur complement B(A + J f ) −1 B T . Here, A p and L p are the convection-diffusion operator and Laplace operator in the finite dimensional solution space of the pressure with prescribed boundary conditions, M p is the pressure mass matrix. For this PCD preconditioner, both A + J f and S p are approximated by the algebraic multigrid (AMG) method that is implemented in IFISS.
We set the Reynolds number Re = 2000 as discussed in the previous section. We report the performance of both preconditioners in Tables 1 and 2. Here the column "precon." represents the time for MSSS permutations and the block LDU factorization using MSSS matrix computations or the time for the setup of the AMG method, and the column "IDR(4)" denotes the time of the preconditioned IDR(4) solver to compute the solution up to prescribed accuracy. Both time are measured in seconds. Since we focus on 2D problems, the MSSS matrices are two-level MSSS matrices and for making use of level 3 BLAS implementations, we set the size of the SSS blocks to 8 for all test cases.  Table 1 shows the time to compute the MSSS preconditioner scales linearly with the problem size. The number of iterations remains virtually constant with the refinement of the mesh. The time of the preconditioned IDR(4) solver also scales linearly with the problem size. For PCD preconditioner, the preconditioned IDR(4) solver fails to converge to the solution of prescribed accuracy within 400 iterations for relatively bigger mesh size. For a very fine mesh, the preconditioned IDR(4) solver computes the solution up to the prescribed accuracy within 300 iterations. This is because the entries of perturbation by the matrix J f in (23), which is introduced by the nonlinear right-hand side, is of (h 2 ). As h → 0, this perturbation by the Jacobian matrix becomes smaller compared with A in the (1, 1) block of (23). Therefore, S p approximates the perturbed Schur complement that corresponds to smaller mesh sizes better than that for bigger mesh sizes.
The computational results by the MSSS preconditioner in Table 2 show that the time for the setup of AMG does not scale linearly with the problem size. The reason may be that the AMG implemented in IFISS is not optimized, or the AMG algorithm in IFISS does not have linear computational complexity. To compute the MSSS preconditioner, we set = 10 −5 for the 512 × 512 grid and 10 −4 for the rest grids. Here is the upper bound of the discarded singular values for the model order reduction that is performed to compute the approximate factorization. The maximum off-diagonal rank for the lower and upper part for this set up are plotted in Figure 4. For details of this approximate factorization, we refer to Reference 71. The maximum off-diagonal rank in Figure 4 is bounded by a small constant around 10 for all the computations of MSSS preconditioners, which is much smaller than the size of the systems to be solved. This in turn gives the linear computational complexity of the MSSS preconditioning techniques, which is illustrated by Table 1.

Optimization algorithm
We test Algorithm 1 for the optimization problem (6) by using a 256 × 256 grid. At each outer iteration, we need to solve a series of linear equations of size 197, 634 × 197, 634 to compute the solution of the nonlinear flow equation (8) by the Newton's method. We also need to solve two linear equations (14) and (19) of the same size to compute the gradient and the Hessian matrix. We use the wind farm configuration as introduced in Section 2.1. With this problem settings, the rightmost turbine just operates in the maximum power tracking mode, that is, C T 3 = 1. Therefore, we just need to optimize C T 1 and C T 2 for the first two turbines. Without optimization, the turbines are operated in the mode that corresponds to the maximal power extraction for each single turbine, that is, C T 1 = C T 2 = 1. We start with C (0) = 1 as an initial start for this optimization problem. Then the (scaled) total extracted power by the wind farm at each optimization step is given in Figure 5(a).
Results in Figure 5(a) show that the total power is increased by around 5.5% when applying optimal control scheme. To show the performance of Algorithm 1, we also solve the optimization problem with an initial start C (0) = 0, although this corresponds to an impractical operation status. The scaled total power is given in Figure 5(b). For those two cases with different initial guesses, the corresponding optimized variables C T 1 and C T 2 at each optimization step are given in Figure 6. Figure 6(a,b) show that with different initial guesses, the optimized variables (C T 1 , C T 2 ) converge to the same point (0.729, 0.862), which corresponds to a local optimum of the optimization problem.
Note that there is an overshoot after a few optimization steps for both cases as illustrated by Figure 5, which makes the optimization problem to converge to a local optimum. This may be primarily because of the nonconvexity and highly nonlinearity of this optimization problem. The convexity of this optimization problem is still an open problem. Another reason that contributes to this behavior is the sensitivity of this optimization problem. Here we measure the sensitivity of the change of the velocity in the flow direction with respect to C T j by u x C T j . We plot the magnitude of u x C T 1 and u x C T 2 at the local optimum of C T 1 and C T 2 in Figure 7.  Figure 7 show that there is a big gradient in the vicinity of the places where the turbines are located. This in turn tells us that the velocity in the vicinity of the turbines is very sensitive to the changes of C T j . This makes this optimization problem very sensitive. Another reason may be the robustness and the efficiency of the Newton's method given by Algorithm 1. To overcome such an overshot, instead of the fixed step size of the Newton's method, some line search methods such as the Armijo linear search algorithm 60 could be applied to adaptively choose the step size.

CONCLUSIONS AND REMARKS
In this article, we studied preconditioning in-domain Navier-Stokes control problem using multilevel sequentially semiseparable (MSSS) matrix computations. For the in-domain Navier-Stokes control problem, the control input only acts on part of the domain, which results in a problem even more difficult to solve. The optimization problem was solved by a reduced Newton's method while the most time consuming part for this problem was solving a nonlinear flow equation and computations of the gradient and the Hessian matrix. We showed that the gradient and the Hessian matrix can be computed by solving a linear equation that is exactly the same as the last nonlinear iteration of the Navier-Stokes equation. Therefore, the preconditioner for the solution of the flow equation can be reused. This in turn reduces the computational complexity dramatically. We evaluated the performance of the MSSS preconditioning technique by using IFISS. Numerical results show the superiority of the MSSS preconditioning technique to the standard preconditioning technique.
Since we focused on preconditioning in-domain Navier-Stokes control problem, we used a laminar flow model in IFISS to study the performance of the MSSS preconditioning technique. The next step to extend this research shall focus on applying the turbulent flow model for the real-world wind farm control applications, while some recent development in optimal control of unsteady Reynolds-averaged Navier-Stokes equations (RANS) 74 is a good starting point to extend our preconditioning technique.
The MSSS structure can be inferred from a topologically uniform mesh discretization of a regular domain. When general domains can be parameterized using topologically uniform mesh, such as in the isogeometric analysis, applying MSSS is direct. Extending MSSS to unstructured meshes is also possible, one can first compute the solution of the original problem on a structured mesh and then interpolate on the unstructured mesh since MSSS is often used as a preconditioner but not a direct solver. By following this procedure, the convergence of the Krylov solver for problems with unstructured mesh discretization can be improved.