Chordal and factor-width decompositions for scalable semidefinite and polynomial optimization

Chordal and factor-width decomposition methods for semidefinite programming and polynomial optimization have recently enabled the analysis and control of large-scale linear systems and medium-scale nonlinear systems. Chordal decomposition exploits the sparsity of semidefinite matrices in a semidefinite program (SDP), in order to formulate an equivalent SDP with smaller semidefinite constraints that can be solved more efficiently. Factor-width decompositions, instead, relax or strengthen SDPs with dense semidefinite matrices into more tractable problems, trading feasibility or optimality for lower computational complexity. This article reviews recent advances in large-scale semidefinite and polynomial optimization enabled by these two types of decomposition, highlighting connections and differences between them. We also demonstrate that chordal and factor-width decompositions allow for significant computational savings on a range of classical problems from control theory, and on more recent problems from machine learning. Finally, we outline possible directions for future research that have the potential to facilitate the efficient optimization-based study of increasingly complex large-scale dynamical systems.

Appendix C Some properties of maximal cliques 42
A widespread view since the 1990s is that, once a control problem is reformulated as an SDP or relaxed into one, then the problem is effectively solved (Boyd et al., 1994;Parrilo & Lall, 2003). In today's world of large-scale, complex systems, however, this is no longer true, and the formulation of SDPs that can be solved in practice requires further thought. This is because, even though SDPs can theoretically be solved using algorithms with polynomialtime complexity (Nemirovski, 2006;Nesterov, 2003;Nesterov & Nemirovski, 1994;Vandenberghe & Boyd, 1996;Ye, 2011), the very-large-scale SDPs encountered in reallife applications require prohibitively large computational resources in practice. One particular bottleneck is the complexity of handling large semidefinite constraints; for instance, each iteration of classical interior-point algorithms requires O(n 3 m+n 2 m 2 +m 3 ) time and O(n 2 +m 2 ) memory (Nesterov, 2003, Section 4.3.3), where n is the size of semidefinite constraint and m is the number of equality constraints. The majority of established general-purpose SDP solvers currently available, therefore, cannot handle large problems (e.g., with n larger than a few hundreds and m larger than a few thousands) on a regular computer. Consequently, the application of SDP-based frameworks for analysis and control is currently limited to mediumscale linear systems and small-scale nonlinear ones.
Overcoming these scalability issues is a problem that has received much attention in recent years (Ahmadi et al., 2017b;De Klerk, 2010;Majumdar et al., 2020;Vandenberghe et al., 2015), and significant progress has been made through a number of different approaches. Most of them are related by a simple, yet powerful, underlying idea: decompose a large positive semidefinite matrix X as a sum of structured ones, for which it is easier to impose positivity.
One type of structured decomposition considers sums of low-rank matrices (Burer & Choi, 2006;Burer & Monteiro, 2003Burer et al., 2002). Specifically, one writes X =  Figure 1.1: Illustration of chordal decomposition, where denotes positive semidefiniteness and X ∈ S n + (E, ?) is a positive semidefinite completion constraint (see Section 2.2 for a precise definition). (a) A chordal graph with five vertices, six edges, and two maximal cliques (complete connected subgraphs), C1 = {1, 2, 3} and C2 = {3, 4, 5}; (b) Chordal decomposition of a semidefinite constraint on a sparse matrix X into smaller positive semidefinite constraints on matrices Y1, Y2 with nonzero entries indexed by the cliques C1 and C2; (c) Chordal decomposition of a positive semidefinite completion constraint on a sparse matrix X with smaller positive semidefinite constraints on its principal submatrices X1 and X2 indexed by the cliques C1 and C2.
the original SDP (Pataki, 1998). However, while low-rank decomposition can bring considerable performance gains on large SDPs, it transforms a convex problem into a nonconvex one. Solution algorithms for the latter cannot be guaranteed to converge to the global minimum unless the original SDP is sufficiently "smooth" and t is large enough (Boumal et al., 2020;Waldspurger & Waters, 2020).
A second type of structured decomposition, which we focus on in this paper, considers sums of sparse matrices. In this case, one writes X = t i=1 Y i for positive semidefinite matrices Y 1 , . . . , Y t that are nonzero only on a certain (and, ideally, small) principal submatrix. The choice of these principal submatrices is crucial in determining the particular type of matrix decomposition, as well as its properties. Two common selection strategies distinguish whether the original matrix X is dense or sparse.
If X is sparse, the principal submatrices are usually indexed by the maximal cliques of the sparsity graph of X; these notions will be defined precisely in Section 2, but are illustrated in Figure 1.1. When the sparsity graph is chordal, meaning that all cycles of length larger than three have an edge between nonconsecutive vertices, the existence of a clique-based decomposition is guaranteed (Agler et al., 1988;Griewank & Toint, 1984;Kakimura, 2010). One can therefore replace the optimization of the large matrix X with the optimization of the matrices Y 1 , . . . , Y t without any loss of generality. Together with a dual result on the existence of positive semidefinite matrix completions (Grone et al., 1984), this chordal decomposition strategy enables one to significantly reduce the computational complexity of SDPs involving sparse positive semidefinite matrices (Fukuda et al., 2001;Kim et al., 2011;Nakata et al., 2003;Vandenberghe et al., 2015).
When X is dense, instead, each matrix Y i in the decomposition X = t i=1 Y i is chosen to be nonzero only on one of the t = n k possible k × k principal submatrices of X, where the parameter k ≥ 2 is specified a priori. This type of decomposition leads to factor-width-k inner approximations of the positive semidefinite cone (Boman et al., 2005), which are conservative but improve as k is increased. When k n, optimizing over the matrices Y 1 , . . . , Y t , rather than over the original dense matrix X, leads to SDPs with small positive semidefinite cones, which can often be handled efficiently. In the extreme case k = 2, one obtains a second-order cone program, for which scalable algorithms exist (Alizadeh & Goldfarb, 2003).
This paper offers a comprehensive review of chordal and factor-width-k decomposition methods, as well as of their application to large-scale semidefinite programming and polynomial optimization. Our goal is to introduce practitioners in control theory to the latest advances in these fields, which over the last decade or so have increased the scale of systems for which optimization-based frameworks for analysis and control can be implemented at a reasonable cost. Examples of problems that can now be handled efficiently include the analysis and synthesis of large-scale linear networked systems (Andersen et al., 2014b;Mason & Papachristodoulou, 2014;Zheng et al., 2018c,d), the stability analysis and the approximation of regions of attraction for sparse nonlinear systems (Ahmadi & Majumdar, 2019;Schlosser & Korda, 2020;Tacchi et al., 2019a;Zheng et al., 2019a), optimal power flow in power grids (Andersen et al., 2014a;Jabr, 2011;Molzahn et al., 2013), and numerous problems in machine learning (Batten et al., 2021;Chen et al., 2020b;Dahl et al., 2008;Kim et al., 2009;Latorre et al., 2020;Newton & Papachristodoulou, 2021;Zhang et al., 2018). We hope that knowledge of the advanced optimization techniques discussed here can assist control theorists in developing efficient modelling frameworks that can be applied much more widely and, crucially, to increasingly complex large-scale systems.

Outline
After introducing relevant graph-theoretic notions in Section 2, we discuss chordal decomposition methods for general SDPs in Section 3. Section 4 looks at decomposition methods for sparse polynomial optimization problems, which arise when relaxing analysis and control problems for nonlinear systems. Factor-width-k decompositions for dense matrices are discussed in Section 5. Section 6 presents examples of how matrix decomposition can be applied to some classical control problems and to some recent problems in machine learning. Section 7 draws conclusions and outlines possible directions for future research.

Basic notation
Mathematical symbols are defined as necessary in each of the following sections, but we summarize common notation here. The m-dimensional Euclidean space, the vector space of n×n real symmetric matrices, and the cone of n×n positive semidefinite symmetric matrices are denoted, respectively, by R m , S n , and S n + . Angled brackets are used to denote the inner product in any of these spaces; in particular, x, y = x T y when x, y ∈ R m and X, Y = trace(XY ) when X, Y ∈ S n . We often write X 0 instead of X ∈ S n + when the matrix size is clear from the context or is unimportant, and write X 0 if X is strictly positive definite.

Chordal graphs and matrix decomposition
This section reviews chordal graphs and their applications to sparse matrix decomposition. Matrix decomposition is central to many sparsity-exploiting techniques for semidefinite and polynomial optimization. Detailed introductions to chordal graphs can be found in the surveys by Blair & Peyton (1993) and Rose (1970), and in the monographs by Vandenberghe et al. (2015) and Golumbic (2004). We first introduce some graph-theoretic notions in Section 2.1, and then given an overview of classical matrix decomposition and completion results in Section 2.2. Extensions to sparse block-partitioned matrices are discussed in Section 2.3.

Chordal graphs
A graph G(V, E) is defined by a set of vertices V = {1, 2, . . . , n} and a set of edges E ⊆ V × V. A graph G is undirected if (v i , v j ) ∈ E implies that (v j , v i ) ∈ E. A path in G(V, E) is a sequence of edges that connect a sequence of distinct vertices. A graph is connected if there is a path between any two vertices, and complete if any two vertices are connected by an edge, i.e., E = V × V. The subgraph induced by a subset of vertices W ⊂ V is the undirected graph with vertices W and edges E ∩ (W × W). A subset of vertices C ⊆ V is called a clique if the subgraph induced by C is complete. If C is not contained in any other clique, it is a maximal clique. The number of vertices in C is denoted by |C|.
A cycle of length k ≥ 3 in a graph G is a set of pairwise distinct vertices {v 1 , v 2 , . . . , v k } ⊂ V such that (v k , v 1 ) ∈ E and (v i , v i+1 ) ∈ E for i = 1, . . . , k − 1. A chord in a cycle is an edge connecting two nonconsecutive vertices. Examples of chordal graphs are given in Figure 2.1. Observe also that many common types of graphs are chordal, including chains, acyclic undirected graphs (i.e., graphs with no cycles, such as trees), undirected graphs with cycles of length no greater than three, and complete graphs.
Chordal graphs have a number of properties that make them easy to handle computationally. For example, a connected chordal graph has at most n − 1 maximal cliques, (c) A "block-arrow" graph. The names "banded" and "block-arrow" are motivated by the fact that, as explained in Section 2.2.1, these graphs describe the sparsity patterns of the matrices in Figure 2.3. and they can be identified in linear time with respect to the number of vertices and edges  using, for instance, Algorithm 2 in Appendix C. In addition, any induced subgraph of a chordal graph is chordal because cycles in the subgraph are also cycles in the original graph. This is a useful fact in several induction proofs using chordality in Blair & Peyton (1993, Section 2). Finally, chordal graphs admit a so-called perfect elimination ordering of the vertices, which is central to the zero fillin property of Cholesky factorizations for sparse matrices. These two properties are reviewed in Appendix A. Given the rich structure implied by chordality, it is very often convenient to extend a nonchordal graph G(V, E) into a chordal graphĜ(V,Ê) with larger edge setÊ ⊃ E, which is called a chordal extension of G. Usually, a graph admits many different chordal extensions, including the trivial one with edge setÊ = V × V obtained by completion, and the one obtained by completing only the graph's connected components. Finding a minimal chordal extension, meaning that the smallest possible number of additional edges has been added, is an NP-complete problem (Yannakakis, 1981). However, approximately minimal chordal extensions can often be constructed in practice using heuristic strategies such as the maximum cardinality search (Berry et al., 2004) and the symbolic Cholesky factorization with approximately minimum degree ordering (Fukuda et al., 2001;Vandenberghe et al., 2015).  Figure 2.2(b) by adding edge (2, 4), edge (1, 3), or both. The first two extensions are minimal, while the latter is the trivial extension by completion. The minimal chordal extension obtained by adding edge (2, 4) has two maximal cliques, C 1 = {1, 2, 4} and C 2 = {2, 3, 4}.

Sparse matrix decomposition
This subsection reviews two fundamental results on the decomposition of sparse positive semidefinite matrices whose sparsity can be described using chordal graphs.

Sparse symmetric matrices
Fix any positive integer n and set V = {1, . . . , n}. Given an undirected graph G(V, E), we say that a symmetric ma-  trix X ∈ S n has a sparsity graph G (alternatively, sparsity pattern E) if X ij = X ji = 0 when (i, j) / ∈ E. We denote the space of sparse symmetric matrices by For example, the graph 1 in Figure 2.2(b) describes the sparsity pattern of the matrix where each entry X ij may be nonzero or zero. Similarly, the symbolic matrices in Figure 2.3 have sparsity patterns described by the graphs in Figure 2.1.
Given X ∈ S n (E, 0), the diagonal elements X ii and the off-diagonal elements X ij with (i, j) ∈ E may be nonzero or zero. Thus, if X ∈ S n (E, 0) andÊ ⊃ E is an extension of the edge set, then we also have X ∈ S n (Ê, 0). In this paper, we are especially interested in chordal extensions of sparsity pattern. For simplicity, we will say that a matrix X has a chordal sparsity pattern if its corresponding sparsity graph G(V, E) is chordal. Of course, this can always be achieved via chordal extension.
In what follows, it will be convenient to refer to particular principal submatrices of a sparse matrix, indexed by the maximal cliques of its sparsity graph. Given a clique 1 Throughout, we assume that each vertex has a self-loop, unless otherwise noted. We omit the self-loops when plotting a graph.
a sparse n × n symmetric matrix that has Y as its principal submatrix indexed by C k , and is zero otherwise. For example, the chordal graph in Figure 2.2(b) has a maximal clique C 1 = {1, 2, 4}, and the corresponding matrix E C1 is For the sparse matrix X ∈ S 4 in (2.1) and any matrix Y ∈ S 3 , we have

Cone of sparse positive semidefinite matrices
Denote the set of positive semidefinite matrices with sparsity pattern E by This set is a convex cone because it is the intersection of a subspace and a convex cone. If G(V, E) is a chordal graph, S n + (E, 0) can be represented using smaller but coupled convex cones, as stated in the following result (Agler et al. 1988, Theorem 2.3;Griewank & Toint 1984, Theorem 4;Kakimura 2010, Theorem 1).
The "if" part of Theorem 2.1 is immediate, since a sum of positive semidefinite matrices is positive semidefinite. The "only if" part, instead, can be proven using the zero fill-in property of sparse Cholesky factorization for Z ∈ S n + (E, 0) (Vandenberghe et al., 2015, Section 9.2); see Appendix A and Appendix B for details. A similar elementary proof given by Kakimura (2010), based on simple linear algebra and perfect elimination orderings for chordal graphs, reveals that one can impose a rank constraint in the decomposition (2.3): there exist Z k with rank(Z) = t k=1 rank(Z k ) such that (2.3) holds.
Remark 2.1. The chordality assumption in Theorem 2.1 is necessary. For every nonchordal pattern E, while particular matrices in S n + (E, 0) admit the decomposition (2.3), there always exist matrices in S n + (E, 0) that do not; see Vandenberghe et al. (2015, p. 342) for an explicit example. In addition, the decomposition (2.3) generally requires all maximal cliques C 1 , . . . , C t , even when a subset of maximal cliques has already covered the sparsity pattern E (that is E = k∈I C k × C k with I ⊂ {1, . . . , t}). An example of this is given in Appendix C.
Example 2.2. Given a variable x ∈ R 2 , consider the 3×3 linear matrix inequality (LMI) This LMI has the same chordal sparsity pattern as the matrix in (2.4). Consequently, Theorem 2.1 implies that (2.5) holds if and only if there exist matrices such that  After eliminating the variables a, b, c, e and f using this matching condition, we conclude that (2.5) holds if and only if there exists d such that (2.6) Figure 2.4 shows two-dimensional projections of the threedimensional feasible set of the two LMIs in (2.6). As expected, the projection on the (x 1 , x 2 ) plane coincides with the feasible set of LMI (2.5), which is contained inside the thick black line in Figure 2.4(a). This confirms that the LMIs in (2.6) are equivalent to the LMI (2.5). Therefore, we have decomposed a 3 × 3 LMI into two coupled LMIs of size 2 × 2.

Cone of positive-semidefinite-completable matrices
A concept related to the matrix decomposition above is that of positive semidefinite matrix completion. Given a matrix X ∈ S n , let be its projection onto the space of sparse matrices S n (E, 0) with respect to the Frobenius matrix norm. We define the cone S n + (E, ?) := P S n (E,0) (S n + ). Using (2.7), it is not hard to see that a sparse matrix X is in S n + (E, ?) if and only if it has a positive semidefinite completion, meaning that some (or all) of the zero entries X ij with (i, j) / ∈ E can be replaced with nonzeros to obtain a positive semidefinite matrix X. We call X the completion of X and refer to S n + (E, ?) as the cone of positive-semidefinite-completable matrices.  Figure 2.5: Summary of duality between S n + (E, 0) and S n + (E, ?) and duality between Theorem 2.1 and Theorem 2.2 for a chordal graph G(V, E) with maximal cliques C1, . . . , Ct.
Remark 2.2 (Nonuniqueness of the positive semidefinite completion). The positive semidefinite completion of a matrix X ∈ S n + (E, ?) with sparsity pattern E is generally not unique. For a chordal sparsity pattern E, two widely used and efficient strategies to compute a completion X are the maximum determinant completion (Vandenberghe et al., 2015, Chapter 10.2), which maximizes det X, and the minimum rank completion (see Dancis 1992;Jiang 2017;Sun 2015, Chapter 3.3), which minimizes rank(X). In particular, there exists a positive semidefinite completion X whose rank agrees with the maximum rank of the principal submatrices E Ci XE T Ci (Dancis, 1992, Theorem 1.5), i.e., For any undirected graph G(V, E), the cones S n + (E, ?) and S n + (E, 0) are dual to each other with respect to the trace inner product X, Z = Trace(XZ) in the space S n (E, 0) (Vandenberghe et al., 2015, Chapter 10). To see this, observe that For a chordal matrix sparsity pattern, Theorem 2.1 on the decomposition of the cone S n + (E, 0) can be dualized to obtain the following characterization of S n + (E, ?), first proved by Grone et al. (1984, Theorem 7).
The "only if" part of Theorem 2.2 is immediate, since any principal submatrix of a positive semidefinite matrix is positive semidefinite. The "if" part, instead, relies on the properties of chordal graphs and, as mentioned above, can be proven by combining the duality between S n + (E, 0) and S n + (E, ?) with Theorem 2.1 (Vandenberghe et al., 2015, p. 357). Precisely, The first equivalence expresses the duality between S n + (E, 0) and S n + (E, ?), the second one follows from Theorem 2.1, and the third one follows from the cyclic property of the trace operator: Trace(M N ) = Trace(N M ) for any matrices M, N of compatible dimensions. Figure 2.5 illustrates how the duality between S n + (E, 0) and S n + (E, ?) is mirrored in the duality between Theorem 2.1 and Theorem 2.2 for chordal graphs.
Example 2.3. Consider the symmetric matrix X =   2 1 0 1 0.5 1 0 1 2   , whose sparsity pattern is the (by now usual) 3-node chordal chain graph with maximal cliques C 1 = {1, 2} and C 2 = {2, 3}. It is easy to check that, while X is not positive semidefinite, the principal submatrices indexed by the cliques C 1 and C 2 are. Then, Theorem 2.2 guarantees that X ∈ S n + (E, ?), meaning that the zero entries may be replaced by nonzeros to obtain a positive semidefinite matrix X. One possible positive semidefinite completion is In fact, this is the minimum-rank completion whose rank, rank(X) = 1, coincides with the maximum rank of individual principal submatrices of X (cf. Remark 2.2).
Example 2.4. Consider the problem of finding a variable x ∈ R 3 such that the matrix (2.10) Figure 2.6: Region of R 3 where the matrix X(x) in (2.10) admits a positive semidefinite completion (blue shading). This region coincides with the intersection of the region of R 3 where the first LMI in (2.12) is feasible (red shading; the region extends to infinity in the x3 direction) and the cylindrical region of R 3 where the second LMI in (2.12) is feasible (green shading; the region extends to infinity in the x1 direction). Thick red and green lines highlight the cross section of these two regions.
admits a positive semidefinite completion. This is equivalent to finding x ∈ R 3 as well as a corresponding scalar y ∈ R such that Since the sparsity graph of X(x) is chordal, Theorem 2.2 implies that (2.10) is equivalent to the two LMIs Feasible vectors x for the first of these two LMIs can be found by imposing 1 − x 1 + x 2 ≥ 0 and (1 − x 1 )x 2 − (x 1 + x 2 ) 2 ≥ 0, while feasible x for the second LMI are found by requiring x 2 +2x 3 +1 ≥ 0 and x 2 (2x 3 +1)−(x 2 +x 3 ) 2 ≥ 0. The feasible sets obtained in each case are illustrated by the red and green regions in Figure 2.6, respectively. The blue region in the figure, instead, represents the threedimensional set of feasible x for (2.11). As expected from Theorem 2.2, this is exactly the intersection of the feasible regions for the two LMIs in (2.12). Similar to Example 2.2, one can therefore replace the original 3 × 3 completion constraint-which is equivalent to LMI (2.11)-with the two 2 × 2 LMIs in (2.12) without any loss of generality.

Block-partitioned matrices
Theorems 2.1 and 2.2 can be extended to block-partitioned matrices characterized by block-sparsity. Such matrices arise, for example, when modeling network systems (cf. Section 6.1), where each block in the partition corresponds to an individual subsystem and sparsity in the network connectivity translates into block-sparsity. Blockpartitioned matrices are also useful in extending factorwidth decomposition that will be discussed in Section 5.

Sparse block matrices
Given a positive integer n, any finite set of positive integers α = {α 1 , α 2 , . . . , α p } is called a partition of n if p i=1 α i = n. The set of all possible partitions of n can be equipped with the following (partial) order relation.
Given any integer n and any partition α = {α 1 , . . . , α p } of n, a matrix M ∈ R n×n can be written in the block form For the finest partition α = {1, . . . , 1} = 1 n , the block M ij reduces to the entry (i, j) of M . As shown below and in Section 5.3, however, the freedom to consider a nontrivial partition offers considerable flexibility when devising decomposition strategies for a large matrix M . In particular, by refining or coarsening a partition one can in principle split a matrix into blocks of optimal size for the computational resources at one's disposal. The block sparsity pattern of an n × n matrix M whose blocks are defined by a partition α = {α 1 , . . . , α p } of n can be described using a graph G(V, E) with V = {1, 2, . . . , p} and edge set such that M ij = 0 if (i, j) / ∈ E, where M ij is the (i, j)-th block in M and 0 denotes a zero block of appropriate size. We call α a chordal partition if the corresponding block sparsity graph G(V, E) is chordal. The linear space of sparse symmetric block matrices with a prescribed block sparsity pattern E is then given by The block-sparse positive semidefinite cone and the blocksparse positive-semidefinite-completable cone are simply α. An example is the 9 × 9 symbolic matrix where the partitions α 1 = {2, 1, 1, 2, 2, 1} and α 2 = {3, 3, 3} are both chordal (the corresponding block sparsity graphs are illustrated in Figure 2.7). For a given chordal partition, in this example but also in general, completing all blocks of M that are not identically zero results in a chordal extension of M . For instance, the chordal extension of the 9×9 matrix above resulting from the partitions α 1 and α 2 are, respectively, where entries colored in red have been added by the blockcompletion process. Finding a chordal partition for a matrix, therefore, gives a way of performing a particular chordal extension of its sparsity pattern. The opposite, however, is not true: not all chordal extensions are obtained via a block-completion operation. One example for the 9 × 9 matrix above is the chordal extension which is obtained by a symbolic Cholesky factorization with approximately minimal degree ordering.

Chordal decomposition of sparse block matrices
As anticipated above, decomposition results similar to Theorems 2.1 and 2.2 hold for S n α,+ (E, ?) and S n α,+ (E, 0) when α is a chordal partition of n. Given a clique C k of the chordal block sparsity graph G(V, E) subordinate to the chordal partition α, we define the block matrix E C k ,α ∈ R s(α,k)×n , where s(α, k) = i∈C k α i , as (2.14) Here, I αi is an identity matrix of dimension α i . When α = {1, . . . , 1} is the trivial partition, E C k ,α reduces to the matrix E C k in (2.2). Similar to the case studied in Section 2.2, the operation E C k ,α XE T C k ,α ∈ S s(α,k) extracts the principal block-submatrix of X whose blocks are indexed by C k , while E T C k ,α Y E C k ,α "inflates" an s(α, k) × s(α, k) matrix into a sparse n × n block matrix.
We are now ready to extend Theorems 2.1 and 2.2 to the case of sparse block matrices.
Theorem 2.3 (Chordal block-decomposition). Let G({1, . . . , p}, E) be a chordal graph with maximal cliques C 1 , C 2 , . . . , C t , and let α = {α 1 , . . . , α p } be a partition of n. Then, Z ∈ S n α,+ (E, 0) if and only if there exist matrices Z k ∈ S s(α,k) + for k = 1, . . . , t such that (2.15) Theorem 2.4 (Chordal block-completion). Let G({1, . . . , p}, E) be a chordal graph with maximal cliques C 1 , C 2 , . . . , C t , and let α = {α 1 , . . . , α p } be a partition of n. Then, X ∈ S n α,+ (E, ?) if and only if (2.16) The proofs of Theorems 2.3 and 2.4 rely on the fact that the block sparsity graph of X ∈ S n α (E, 0) induces a chordal extension of the standard sparsity graph of X (cf. Remark 2.3) and, in fact, it is a hypergraph of the latter. The normal chordal decomposition and completion from Theorems 2.1 and 2.2 can then be applied to the chordal extension of X, and the hypergraph structure implies the two results above. Interested readers are referred to Zheng (2019, Chapter 2.4) for details.
Example 2.5. Consider the 9 × 9 matrices which have the same nonchordal sparsity pattern as the symbolic matrix considered in Remark 2.3. Readers can easily check that X is positive semidefinite, while Y admits a positive semidefinite completion (e.g., replace all zero entries with ones to obtain Y = 11 T 0). The partitions α 1 = {2, 1, 1, 2, 2, 1} and α 2 = {3, 3, 3} are both chordal, so while Theorem 2.1 cannot be directly applied to decompose X, Theorem 2.3 guarantees the existence of decompositions either in the symbolic form (corresponding to the partition α 1 ) or in the symbolic form (corresponding to the partition α 2 ). These coincide with the classical chordal decompositions applied to the chordal extensions of X from the chordal partitions α 1 and α 2 . Thus, one can choose whether to decompose X as a sum of four matrices with 5 × 5 nonzero principal submatrices, or as a sum of two matrices with 6 × 6 nonzero principal submatrices. Similarly, one can apply Theorem 2.4 to verify that the matrix Y admits a positive semidefinite completion by checking the positive semidefiniteness of either four 5 × 5 principal submatrices, or two 6 × 6 ones.

Sparse semidefinite optimization
The matrix decomposition and completion results in Theorems 2.1 and 2.2 can be used to reduce the complexity of algorithms for sparse semidefinite optimization. A semidefinite program (SDP) in standard primal form takes the form where C, A 1 , . . . , A m ∈ S n , b ∈ R m are the problem data.
The dual problem to (3.1) is also an SDP, (3.2) In this section, we describe decomposition techniques for SDPs that exploit the joint sparsity pattern of the coefficient matrices C, A 1 , . . . , A m , called aggregate sparsity pattern. For simplicity, we assume that the matrices A 1 , . . . , A m are linearly independent and that there exist X 0, y ∈ R and Z 0 satisfying the equality constraints in (3.1) and (3.2). This ensures that the primal and dual optimal values are finite, equal, and attained. SDPs that are infeasible or have unbounded objective can be tackled using homogeneous self-dual embeddings (O'Donoghue et al., 2016;Ye, 2011;Ye et al., 1994) or by analyzing the divergence of the iterates produced by solution algorithms (Banjac et al., 2019;Liu et al., 2017). Sparsity can be exploited within these frameworks, too, and we refer the interested reader to Zheng et al. (2020, Section 5) and Garstka et al. (2019) for details.

Aggregate sparsity
The pair of SDPs (3.1)-(3.2) is said to have aggregate sparsity graph G(V, E) if C, A 1 , . . . , A m ∈ S n (E, 0). (3.3) Of course, if E is an extension of E, then G(V, E ) is also a suitable aggregate sparsity graph. The minimal one, therefore, is simply the union of the individual sparsity graphs of C, A 1 , . . . , A m . Throughout this section, however, we consider a chordal extension of the minimal aggregate sparsity graph. We therefore assume from now on that the aggregate sparsity pattern E is chordal and has t maximal cliques C 1 , . . . , C t . It must be noted that an SDP may have a fully connected aggregate sparsity graph even if all coefficient matrices C and A 1 , . . . , A m are very sparse; see Zheng et al. (2018b) for explicit examples. The decomposition methods described below cannot be applied to such problems. However, there are broad classes of SDPs for which the sparsity of the SDP data matrices can be expected to translate into very sparse aggregate sparsity graphs.
One such family consists of SDPs arising from relaxations of graph optimization problems and control problems over networks, which typically inherit the structure of the underlying network or graph. Notable examples include SDP relaxations of combinatorial graph optimization problems, such as Max-Cut (Goemans & Williamson, 1995) and graph equipartition (Karisch & Rendl, 1998), eigenvalue optimization problems over graphs (Boyd et al., 2004), analysis of linear networked systems (Deroo et al., 2015;Mason & Papachristodoulou, 2014;Zheng et al., 2018c,d), sensor network localization Nie, 2009;So & Ye, 2007), neural network verification in machine learning (Batten et al., 2021;Raghunathan et al., 2018), and the optimal power flow problem in electricity networks (Andersen et al., 2014a;Bai et al., 2008;Jabr, 2011). We will briefly discuss some of these applications in Section 6.
Another source of SDPs with aggregate sparsity is the reformulation of intractable constraints (either convex or nonconvex) as tractable LMIs using auxiliary variables (Ben-Tal & Nemirovski, 2001;Vandenberghe et al., 2015). For example, consider the uncountable family of "uncertain" convex quadratic constraints on a variable x ∈ R q , to be imposed for all matrices A ∈ R p×q , vectors b ∈ R q and scalars c ∈ R in the form Here, A 0 , b 0 and c 0 are nominal reference values, and {A i , b i , c i } are fixed perturbations. Andersen et al. (2010b) showed that this family of constraints is equivalent to a sparse LMI in the form where t ∈ R while G : R q → R r×p , h : R q → R r and f : R q → R are known linear functions whose exact form is not important here. When r p, this matrix has a "block-arrow" aggregate sparsity pattern analogous to that shown in Figure 2.3(c) (that figure is recovered exactly when p = 1, r = 6, and q is arbitrary). This particular type of sparsity pattern is commonly encountered in robust optimization (Andersen et al., 2010b;Ben-Tal & Nemirovski, 1998;Goldfarb & Iyengar, 2003).
Remark 3.1 (Promoting aggregate sparsity). Sometimes, it is possible to reformulate SDPs with no aggregate sparsity as equivalent SDPs with very sparse aggregate sparsity graphs through a carefully chosen transformation of variables (Fukuda et al., 2001, Section 6;Vandenberghe et al., 2015, Chapter 14.1). For instance, the SDP relaxation of the graph equipartition problem studied by Fukuda et al. (2001, Section 6) has sparse data matrices C and A 1 , . . . , A m−1 , but the m-th constraint 11 T , X = 0 destroys the problem's aggregate sparsity because the matrix A m = 11 T is dense. However, any matrix X ∈ S n + satisfying 11 T , X = 0 can be expressed as Thus, the original SDP can be reformulated as Since V is a sparse basis matrix and the original data matrices are sparse, this new SDP is characterized by aggregate sparsity (Fukuda et al., 2001, Section 6). Sparsity-promoting modeling strategies that generalize this example are discussed by Vandenberghe et al. (2015, Chapter 14.1).

Nonsymmetric formulation
The aggregate sparsity of the primal-dual pair of SDPs (3.1)-(3.2) can be exploited by reformulating them into a nonsymmetric pair of optimization problems, proposed by Fukuda et al. (2001) and later discussed extensively by Andersen et al. (2010a); Kim et al. (2011);Sun et al. (2014); . Consider first the dual-standard-form SDP (3.2). Any feasible matrix Z must be at least as sparse as the aggregate sparsity pattern of the SDP. We can therefore restrict Z to the subspace S n (E, 0), where E is the edge set of the aggregate sparsity graph, and rewrite (3.2) as (3.5) The primal-standard-form SDP (3.1), instead, typically has a dense optimal matrix X. However, the value of the cost function and the equality constraints depend only on the entries X ij with (i, j) ∈ E, while the remaining ones simply guarantee that X is positive semidefinite. We can therefore pose (3.1) as an optimization problem over the cone S n + (E, ?) of sparse matrix that admit a positive Special 3 Gradient proj. Sun & Vandenberghe (2015) (3.1)-(3.2) Spingarn Kalbat & Lavaei (2015) Special 4 ADMM Madani et al. (2017a) General 5 ADMM  (3.1)-(3.2) ADMM CDCS Garstka et al. (2019) Quad. SDP 6 ADMM COSMO Note: 1. It requires an explicit trace constraint on X; 2. Special SDPs from the optimal power flow (OPF) problem; 3. Special SDPs from the matrix nearness problem; 4. Special SDPs with decoupled affine constraints; 5. General SDPs with inequality constraints; 6. A dual SDP (3.2) with a quadratic objective function.
semidefinite completion, (3.6) Problems (3.6) and (3.5) are a primal-dual pair of linear conic programs because the cones S n + (E, ?) and S n + (E, 0) are dual to each other (see Section 2.2 and Figure 2.5). Even though the sparse matrix cones S n + (E, ?) and S n + (E, 0) are not self-dual (Andersen, 2011;Andersen et al., 2010a), so this sparse formulation is nonsymmetric, one can solve (3.6), (3.5), or both problems simultaneously using a variety of first-order or interior-point algorithms. The next two subsections discuss some of them.
Remark 3.2. A special type of aggregate sparsity arises when the data matrices C, A 1 , . . . , A m are block-diagonal. In this case, any feasible matrix X for (3.6) is automatically positive semidefinite and, consequently, can be restricted to S n + (E, 0). Therefore, the nonsymmetric formulation described above becomes symmetric. In particular, problems (3.5) and (3.6) are simply SDPs with a Cartesian product S n1 + × S n2 + × . . . × S n l + of semidefinite cones, where n i is the size of the ith diagonal block and l is the number of blocks.

First-order algorithms
First-order optimization algorithms rely only on gradient information and have iterations with low computational complexity, which can often be implemented in a distributed manner (Beck, 2017;Boyd et al., 2011). For these reasons, the last decade has witnessed the development of a range of first-order methods to solve large-scale SDPs, many of which are listed in Table 1. Some of these methods (O'Donoghue et al., 2016;Wen et al., 2010;Yurtsever et al., 2021;Zhao et al., 2010) focus on generic SDPs and do not exploit aggregate sparsity. Others, instead, tackle the sparsity-exploiting nonsymmetric formulations (3.5)-(3.6) using so-called domain space or range-space conversion frameworks, which replace the matrix cones S n + (E, ?) and S n + (E, 0) with smaller positive semidefinite cones using the chordal decomposition and completion results in Theorems 2.1 and 2.2 (see, e.g., Dall'Anese et al., 2013;Garstka et al., 2019;Kalbat & Lavaei, 2015;Lam et al., 2012;Lu et al., 2007;Madani et al., 2017a;Sun et al., 2014;Sun & Vandenberghe, 2015;. Many of these works combine this strategy with additional separability assumptions for the equality constraints, which are satisfied in optimal power flow problems (Dall'Anese et al., 2013;Kalbat & Lavaei, 2015;Lam et al., 2012) and the matrix nearness problems (Sun et al., 2014) but not in general. To the best of our knowledge, the only first-order methods that can currently handle general SDPs with aggregate sparsity (including infeasible or unbounded ones) are those developed by  and Garstka et al. (2019).

Domain-and range-space conversion
Consider problem (3.6). When the aggregate sparsity graph is chordal and has maximal cliques C 1 , . . . , C t , Theorem 2.2 allows one to replace the constraint X ∈ S n + (E, ?) with These constraints are coupled in general because the matrices E Cp XE T Cp and E Cq XE T Cq depend on the same entries of X if the cliques C p and C q overlap. The works referenced above differ primarily in how these couplings are handled and, as discussed in Remarks 3.3 and 3.5 below, the choice of strategy can have a considerable impact on the overall complexity of the iterations in a first-order method.
A range-space decomposition of the dual SDP (3.2) can be formulated in a very similar way. When the aggregate sparsity pattern E is chordal, Theorem 2.1 implies that the constraint Z ∈ S n + (E, 0) is equivalent to (3.10) Observe that, as before, the first of these conditions couples the positive semidefinite matrices Z p and Z q if the cliques C p and C q of the aggregate sparsity graph overlap.
To decouple them,  introduced slack variables V 1 , . . . , V t and reformulated (3.10) as (3.11) Using this to eliminate Z from (3.5) yields the range-space decomposition (3.12) While the domain-and range-space decompositions (3.9) and (3.12) have been derived independently, it is not difficult to verify that they are a primal-dual pair of SDPs. The duality between the original SDPs (3.1) and (3.2) is thus inherited by the decomposed SDPs (3.9) and (3.12) by virtue of the duality between Theorem 2.2 and Theorem 2.1. This elegant picture is illustrated in Figure 3.1.
Remark 3.3. The introduction of variables X k and V k leads to redundancies in the affine constraints of (3.9) and (3.12), but is essential to obtain a decomposition framework that is suitable for the development of fast first-order SDP solvers. For example, as explained in Section 3.3.2 below, applying the alternating direction method of multipliers (ADMM) to (3.9) leads to an algorithm whose iterations have closed-form update rules that can be implemented efficiently. The same is usually not true if the redundant constraints in (3.9) are used to eliminate the matrix X: the iterations of the first-order method proposed by Sun et al. (2014), for instance, require the solution of a further SDP with quadratic objective function, which limits its scalability. However, the matrix X may be eliminated from (3.9) without compromising efficiency if the original primal SDP (3.1) has separable affine constraints. This observation was exploited to solve sparse SDPs arising from optimal power flow problems (Dall'Anese et al., 2013;Kalbat & Lavaei, 2015) and matrix nearness problems (Sun & Vandenberghe, 2015). Similar observations apply to the seemingly redundant matrices V k in the range-space decomposed SDP (3.12).

ADMM for decomposed SDPs
The alternating direction method of multipliers (ADMM) is a first-order operator-splitting method developed in the mid-1970s (Gabay & Mercier, 1976;Glowinski & Marroco, 13 1975) to solve general optimization problems in the form where f and g are proper convex (but not necessarily smooth) functions on finite-dimensional normed vector spaces X and Y, A and B are given linear operators from X and Y into a finite-dimensional normed vector space Z, and C ∈ Z is given. Given a penalty parameter ρ > 0 and a dual variable Z ∈ Z that acts as a Lagrange multilier for the equality constraint, ADMM finds a saddle point of the (scaled) augmented Lagrangian by updating the primal variables X , Y and the dual variable Z according to the following rules: The superscript (q) indicates that a variable is fixed to its value at the q-th iteration. Under mild technical conditions (Boyd et al., 2011, Section 3.2), the method converges to an -approximate solution of (3.13) using at most O(1/ ) iterations. Given its slow convergence rate, ADMM is suitable only when (3.14a) and (3.14b) have closed-form expressions and/or can be solved efficiently. Below, we show that this is true when the method is applied to the decomposed SDPs (3.9) and (3.12).
Domain-space decomposition. Consider the domain-space decomposition (3.9). Let χ K (x) denote the characteristic function of a set K, i.e., Upon letting X := {X} and Y := {X 1 , . . . , X t }, this problem may be written in the standard form (3.13) over the spaces X = S n and Y = Z = S |C1| × · · · × S |Ct| , and can therefore be solved using ADMM. Introducing a penalty parameter ρ > 0 and a dual variable Z := {Λ 1 , . . . , Λ t }, where each Λ k ∈ S |C k | acts as a Lagrange multiplier for the corresponding constraint X k = E C k XE T C k , it is not difficult to check that the ADMM step (3.14a) reduces to an equality-constrained quadratic program, Step (3.14b), instead, reduces to t independent positive semidefinite projections of the form for k = 1, . . . , t These three steps have efficient closed-form solutions and can be implemented efficiently (Zheng et al., 2020, Section 4.1). In particular, the t independent projections onto the cones S can be computed through an eigenvalue decomposition with complexity of O(|C k | 3 ) floating-point operations. This is not expensive when all cliques C 1 , . . . , C t of the aggregate sparsity graph E are small, which is often true in many applications. In contrast, the first-order algorithms for generic SDPs developed in O'Donoghue et al. (2016); Wen et al. (2010) require a projection onto the semidefinite cone S n + at each iteration, which becomes a bottleneck when n 1. It is therefore clear that exploiting sparsity via chordal decomposition can bring significant computational savings in ADMM algorithms.
Range-space decomposition:. The range-domain decomposition (3.12) of the dual-standard-form SDP (3.2) can be solved using an ADMM algorithm very similar to that presented above. First, observe that (3.12) is equivalent to Grouping the variables as X := {y, V 1 , . . . , V t } and Y := {Z 1 , . . . , Z t }, this problem can be written in the general form (3.13) over the spaces X = R m × S |C1| × · · · × S |Ct| and Y = Z = S |C1| × · · · × S |Ct| Given a penalty parameter ρ > 0 and a dual variable Z := {Λ 1 , . . . , Λ t }, where each Λ k acts as a Lagrange multiplier for the corresponding constraint Z k − V k = 0, one can easily verify that the ADMM step (3.14a) reduces to solving the equality-constrained quadratic program min y,V1,...,Vt Step (3.14b), instead, reduces to t independent positive semidefinite projections of the form Finally, the dual variables Λ 1 , . . . , Λ t are updated through step (3.14c) as Again, these iterations admit inexpensive closed-loop expressions. Moreover, it is not difficult to see that the ADMM iterations for the range-space decomposition (3.12) and for the domain-space decomposition (3.9) have similar leading-order complexity. In fact, Zheng et al. (2020, Section 4.3) showed that the ADMM algorithms for the primal and dual decomposed SDPs are scaled versions of each other. This shows that the duality picture of Figures 2.5 and 3.1 is reflected also at the algorithmic level.
Remark 3.4. For all fixed penalty ρ > 0, the primal and dual ADMM algorithms outlined above converge to a solution of (3.9) and (3.12), respectively, provided that strict primal-dual feasibility conditions are satisfied (Boyd et al., 2011, Section 3.2). An efficient ADMM algorithm that can handle primal or dual infeasible problems was developed by Zheng et al. (2020, Section 5), who considered the homogeneous self-dual embedding (O'Donoghue et al., 2016;Ye et al., 1994) of the domain-space decomposition (3.9) and the range-space decomposition (3.12).
Remark 3.5. As anticipated in Remark 3.3, considering the variables X k and the constraints X k = E C k XE T C k without eliminating any redundant variables is essential to obtain efficient ADMM iterations. This is because the conic constraints separate completely from the affine ones in (3.9) when applying the splitting strategy of ADMM, making it easy to update each X k via simple projections onto positive semidefinite cones. Similarly, the redundant variables V k and the constraints Z k = V k in (3.12) are essential to decouple the conic constraints from the affine ones, which enables one to handle positive semidefinite constraints via simple projections.

Interior-point algorithms
Interior-point algorithms for convex optimization problems with equality and inequality constraints employ Newton's method to solve a sequence of modified equality-constrained problems, obtained by replacing any inequality constraints with barrier functions in the objective (Nesterov, 2003;Ye, 2011). These barrier functions approximate the characteristic function of the set defined by the original inequality constraints and ensure that the optimal solution of each modified problem is strictly feasible for the original problem, meaning that it is an interior point of the original feasible set.
Since Newton's method relies on second-order (Hessian) information, interior-point algorithms do not share the slow convergence of first-order methods. Instead, they converge to an -approximate solution using at most O(log(1/ )) Newton iterations (Nesterov, 2003;Ye, 2011). In practice, convergence often occurs within tens of iterations. Therefore, interior-point methods are typically preferred when solving (3.1)-(3.2) to high accuracy. The general-purpose SDP solvers SeDuMi (Sturm, 1999), SDPT3 (Tütüncü et al., 2003), SDPA (Yamashita et al., 2012), and MOSEK (Mosek, 2015) are all based on primal-dual interiorpoint methods, and they can very reliably solve small and medium-sized SDPs (e.g., when n is less than a few hundreds and m is less than a few thousands in (3.1)-(3.2)) on regular computers. However, they become impractical for large SDPs because the CPU time and memory requirements for each interior-point iteration increase as O(n 3 m + n 2 m 2 + m 3 ) and O(n 2 + m 2 ), respectively (Nesterov, 2003, Section 4.3.3).
Chordal graph techniques can be exploited to improve the efficiency of interior-point methods when solving largescale SDPs with chordal aggregate sparsity (Andersen, 2011;De Klerk, 2010;Fukuda et al., 2001). This section reviews two general approaches for doing so. The first one, similar to the conversion methods in Section 3.3.1, reformulates problems (3.5) and (3.6) as SDPs with small positive semidefinite cones, which are often easier to solve with general-purpose interior-point solvers (Fukuda et al., 2001;Kim et al., 2011;Nakata et al., 2003;Zhang & Lavaei, 2020b). The second approach, instead, directly solves (3.6)-(3.5) using an interior-point method for nonsymmetric conic optimization (Andersen et al., 2010a;Coey et al., 2020;Nesterov, 2012;Skajaa & Ye, 2015). For other ways to exploit chordal sparsity in the computation of interior-point search directions, we refer the reader to the works by (Benson et al., 2000), Pakazad et al. (2017b) and Fukuda et al. (2001, Section 5).

Conversion methods
Starting from the domain-space decomposed SDP (3.9), Fukuda et al. (2001) and Kim et al. (2011) suggested to eliminate the global matrix X and rewrite the SDP (3.6) only in terms of variables X k ∈ S |C k | + , k = 1, . . . t. To rewrite the cost function and the first set of equality constraints, one must choose matrices C k and A ik that satisfy These affine relations do not usually determine C k and A ik uniquely, and some choices may be more convenient than others from the point of view of computations (Sun et al., 2014, Section 3.1, Zhang & Lavaei, 2020b. The second set of constraints in (3.9), instead, can be enforced via consistency constraints on the entries of X 1 , . . . , X t that correspond to the same elements of X. Such consistency constraints can be formulated as ( 3.18) The primal SDP (3.6) can therefore be rewritten as This conversion process, first proposed in Fukuda et al. (2001), is known as the domain-space decomposition (Kim et al., 2011). The reformulated problem (3.19) has more variables and constraints than the original SDP (3.1), but the large matrix constraint X ∈ S n + is replaced by t smaller ones, X k ∈ S |C k | + for k = 1, . . . , t. In certain cases, the decomposed problem (3.19) is easier to solve than the original SDP (3.1) using general-purpose interior-point solvers; see Nakata et al. (2003) and Fujisawa et al. (2009) for numerical examples. Three other variants of this conversion method, including range-space decompositions, have been studied by Kim et al. (2011).
The main drawback of these conversion methods is that, sometimes, the additional consistency constraints (3.18) significantly increase the size of the Schur complement system that needs to be solved at each interior-point iteration. This can offset the benefits of the clique-based matrix decomposition. As shown recently by Zhang & Lavaei (2020b), this issue can be mitigated using a dualization technique .
Remark 3.6 (Removing redundant constraints). Since the maximal cliques in a chordal graph satisfy the running intersection property (Blair & Peyton, 1993;Fukuda et al., 2001) (see also Appendix C), it is in fact sufficient to enforce the consistency between pairs C j , C k that correspond to the parent-child pairs in a clique tree. Redundant constraints in (3.18) can therefore be removed using the running intersection property. Interested readers are referred to Kim et al. (2011) and Vandenberghe et al. (2015) for details.
Remark 3.7 (Dropping or fixing consistency constraints). In some applications, the SDP (3.1) comes from a semidefinite relaxation of a nonconvex optimization problem. Dropping some consistency constraints in (3.18) leads to a valid weaker relaxation with a lower computational complexity. This idea was successfully applied to semidefinite relaxations for optimal power flow problems (Andersen et al., 2014a) and neural network verification (Batten et al., 2021). Other times, one can enforce some of the consistency conditions a priori and look for feasible (but suboptimal) points for an SDP at a low computational cost. This idea was used in Zheng et al. (2018d) to develop a scalable approach for solving distributed control problems.

Nonsymmetric interior-point algorithms
Chordal graph techniques can also be exploited to speed up interior-point methods for the nonsymmetric pair of sparse SDPs (3.6)-(3.5) without appealing to the matrix decomposition and conversion frameworks described above. Since the cones S n + (E, ?) and S n + (E, 0) are not self-dual, such sparsity-exploiting methods cannot enjoy a complete primal-dual symmetry (Andersen et al., 2010a). Instead, one must resort to purely primal, purely dual, or nonsymmetric primal-dual path-following methods (Andersen et al., 2010a;Burer, 2003;Coey et al., 2020;Nesterov, 2012;Skajaa & Ye, 2015).
To construct nonsymmetric interior-point methods, Dahl et al. (2008) and Andersen et al. (2010a) introduced barrier functions φ : S n (E, 0) → R and φ * : S n (E, 0) → R for the cones S n + (E, 0) and S n + (E, ?), defined as Note that φ (resp. φ * ) is finite only on the interior of S n + (E, 0) (resp. S n + (E, ?)) and tends to +∞ as Z (resp. X) approaches the boundary of this cone. Observe also that φ * is simply the Legendre transform of φ evaluated at −X.
Thanks to the properties of the barrier functions, a minimizing sequence {X µ } µ>0 for (3.6) can be computed by solving the regularized primal problem (3.21) and letting µ → 0. Similarly, a minimizing sequence {y µ , Z µ } µ>0 for (3.5) is found upon solving the regular- for µ → 0. Solutions of the regularized problems for fixed finite µ are usually found using Newton's method, leading to so-called primal scaling and dual scaling interiorpoint methods. Other methods can also be used; for instance, Jiang & Vandenberghe (2021) recently suggested solving (3.21) with a Bregman first-order method, where the complexity of evaluating the Bregman proximal operator can be reduced using a sparse Cholesky factorization.
When Newton's method is applied to (3.21), the KKT optimality conditions are where y ∈ R m is a Lagrange multiplier for the equality constraint in (3.21) and Z is an auxiliary variable arising from the definition of φ * via the Legendre transform. Solutions X µ ∈ S n + (E, ?) as µ is varied define the so-called central path for (3.6). Similarly, the KKT optimality conditions for (3.22) are where X is a Lagrange multiplier for the equality constraint in (3.22). Solutions {y µ , Z µ } ∈ R m × S n + (E, 0) as µ is varied define the central path for (3.5). It is possible to show that (3.23) and (3.24) are equivalent (Andersen, 2011, Chapter 3), so the set of points {X µ , y µ , Z µ } µ>0 in S n + (E, ?)×R m ×S n + (E, 0) define a primal-dual central path. The rest of this section briefly outlines how the chordality of the sparsity pattern E can be exploited in the context of dual-scaling interior point methods. Similar ideas can be used to formulate primal-scaling methods, and we refer interested readers to the work by Andersen et al. (2010a, Section 4.2) for details.
Dual-scaling interior-point methods. Search directions in a dual-scaling interior-point method are obtained by linearizing (3.24) around the current interior iterate X ∈ int(S n + (E, ?)), y ∈ R m and Z ∈ int(S n + (E, 0)). Replacing X, y and Z with X + ∆X, y + ∆y, Z + ∆Z in (3.24), linearizing (3.24c), and eliminating ∆Z yields the Newton . Further elimination of ∆X leads to the Schur complement equation where g ∈ R m is a vector and H is an m × m positive definite matrix, both depending only on the current (known) iterates X, y and Z. Explicit expression for these quantities are given by Andersen et al. (2010a, Section 4.3).
Finding the dual-scaling search direction ∆X, ∆y, ∆Z requires solving the Newton equation (3.25) or the Schur complement equation (3.26). To do this using a direct method, one must first calculate the Hessian and inverse Hessian of the barrier function φ(X) in (3.20a), and then form and factorize the matrix H. This is the most computationally expensive part of any interior-point method. It is in this computation that one can exploit the chordality of the sparsity pattern E (Andersen et al., 2010a).
Fast calculations involving the barrier functions. The value, gradient, Hessian, and inverse Hessian of the dual barrier φ(Z) in (3.20a) can be computed efficiently if the sparsity pattern E of Z is chordal. Similar fast algorithms exist for the primal barrier φ * (X) in (3.20b), but we do not review them here and refer interested readers to Andersen et al. (2010a, Section 3.2) for details.
The key ingredient of these efficient algorithms is a sparse Cholesky factorization with zero fill-in (Blair & Peyton, 1993;Rose, 1970;Vandenberghe et al., 2015): as reviewed in Appendix A, for any positive definite matrix Z in int(S n + (E, 0)) with chordal sparsity there exists a permutation matrix P and a lower triangular matrix L such that This factorization can be computed efficiently by following a recursion on a clique tree (Vandenberghe et al., 2015, Chapter 9.3). Now, to evaluate φ(Z) it suffices to substitute Z = P T LL T P into (3.20a) and observe that because determinants distribute over products and permutation matrices have unit determinant. Thus, φ(Z) can be evaluated efficiently once the Cholesky factorization (3.27) has been computed.
The gradient of φ(Z), instead, is given by the following negative projected inverse Despite the fact that Z −1 is in general dense, the projection onto S n (E, 0) can be computed from its sparse Cholesky factorization (3.27) without computing any other entries of Z −1 (Vandenberghe et al., 2015, Chapter 9.5).
The Hessian of φ at Z applied to a matrix Y ∈ S n (E, 0) is computed as Again, this quantity can be evaluated knowing only the sparse Cholesky factorization of Z and its projected inverse P S n (E,0) (Z −1 ), without explicitly computing the inverse Z −1 or the matrix product Z −1 Y Z −1 (Andersen et al., 2010a(Andersen et al., , 2013. Finally, thanks to the chordal structure, solving the lin- has the same cost as the evaluating the Hessian ∇ 2 φ(Z)[Y ]; see Andersen et al. (2010a, Section 3.2) and Andersen et al. (2013).

Algorithm implementations
We conclude this section by providing a list of numerical packages that implement some of the approaches reviewed above. This list is not exhaustive, and the goal here is to give the interested reader a starting point for numerical experiments. First-order solvers based on augmented Lagrangian methods and ADMM for generic SDPs include SDPNAL/SDPNAL+ Zhao et al., 2010) and SCS (O'Donoghue et al., 2019). CDCS (Zheng et al., 2016) and COSMO (Garstka et al., 2019) are two open-source firstorder solvers that exploit chordal sparsity in SDPs. The MATLAB package CDCS implements the algorithms described in Section 3.3.2 and has interfaces with the optimization toolboxes YALMIP (Löfberg, 2004) and SOSTOOLS (Prajna et al., 2002). The Julia package COSMO solves SDPs with quadratic objective functions.
The conversion methods in Section 3.4.1 are implemented in SparseCoLO (Fujisawa et al., 2009) and CHOMPACK . We note that CHOMPACK also provides useful implementation of many other chordal matrix computations, including maximum determinant positive definite completion and minimum rank positive semidefinite completion. Another MATLAB package Dual-CTC (Zhang & Lavaei, 2020a) implements a dualized clique tree conversion (Zhang & Lavaei, 2020b). The reformulated SDPs after conversion can be solved using general-purpose interior-point solvers, such as SeDuMi (Sturm, 1999), SDPT3 (Tütüncü et al., 2003), SDPA (Yamashita et al., 2012), and MOSEK (Mosek, 2015). SMCP ) is a nonsymmetric interior-point solver that provides a Python implementation of the algorithms in Section 3.4.2. Finally, SDPA-C (Fujisawa et al., 2004) is a primal-dual interior-point solver that exploits chordal sparsity using the maximum-determinant positive definite completion.

Sparse polynomial optimization
We have seen in Section 3 that the chordal decomposition of large semidefinite matrices allows for significant efficiency gains in the solution of sparse SDPs. The same ideas can often be leveraged to replace SDP relaxations of intractable optimization problems, which generally have no inherent sparsity or other computationally advantageous structure, with SDPs that do.
This section describes how sparsity (primarily chordal, but also nonchordal) can be exploited in the context of sum-of-squares (SOS) relaxation techniques for polynomial optimization. As mentioned in the introduction, SOS methods are at the heart of many recent tractable frameworks for the analysis and optimal control of nonlinear systems with polynomial dynamics; see Ahmadi & Gunluk (2018) Our goal is not to offer an exhaustive review of all sparsity-exploiting methods that have been proposed in this field, but rather to introduce the key ideas underpinning most of these methods from a general perspective, in the hope that this can guide further developments. For this reason, we concentrate mainly on two basic problems. The first, discussed in Section 4.2, is to prove that an nvariate polynomial of even degree 2d is a sum of squares and, therefore, globally nonnegative. In this case, we seek to exploit the structure of polynomials that depend only a small subset of all possible degree-2d monomials-a property often referred to as term sparsity. The second problem, discussed in Section 4.3, is to check whether a sparse and symmetric n-variate polynomial matrix P (x) is SOS, and therefore positive semidefinite for all x ∈ R n . In this case, our goal is to leverage the structural sparsity of P , meaning that many of its entries are zero.
Although we focus only on global nonnegativity, all of the sparsity-exploiting techniques discussed in this section can be extended to prove polynomial (matrix) nonnegativity locally on basic semialgebraic sets. Such extensions, which have been studied extensively in order to build hierarchies of sparse SDP relaxations for polynomial optimization problems (Lasserre, 2006;Waki et al., 2006Waki et al., , 2008Wang et al., 2021aWang et al., ,b, 2020a, require some careful technical adjustments, but the underlying strategy is the same as for the global nonnegativity setting. We outline some of these adjustments in Sections 4.2.5 and 4.3.2, and refer readers to the excellent literature on this topic for full details.

Background
Let R[x] n,d be the n+d d -dimensional space of polynomials with independent variables x = (x 1 , . . . , x n ) and degree no larger than d. The n-variate monomial with exponent β = (β 1 , . . . , β n ) ∈ N n and degree |β| = β 1 +· · ·+β n is denoted by x β = x β1 1 x β2 2 · · · x βn n . Given a finite set of exponents B ⊂ N n , we write x B = (x β ) β∈B for the (column) vector of monomials with exponents in B. The cardinality of B is denoted by |B|. We also define If N n d = {β ∈ N n : |β| ≤ d} is the set of all n-variate exponents of degree d or less, the vector x N n d is a basis for R[x] n,d and any polynomial f ∈ R[x] n,d can be written as f (x) = β∈N n d f β x β for some coefficients f β ∈ R. The set of exponents with nonzero coefficient, is called the support of f . Its convex hull is called the Newton polytope of f and is denoted by New(f ).

SOS polyonomials and SDPs
The set of n-variate degree-2d SOS polynomials, denoted by Σ n,2d , is a proper cone in R[x] n,2d (Blekherman et al., 2012, Theorem 3.26). Given an exponent set A ⊆ N n 2d , we define the subcone of SOS polynomials supported on A as Σ[A] := {f ∈ Σ n,2d : supp(f ) ⊆ A}. (4.4) It is well known (see, e.g., Parrilo, 2003Parrilo, , 2013) that a polynomial f ∈ R[x] n,2d is SOS if and only if there exist a set of exponents B ⊆ N n d and a positive semidefinite matrix (4.5) In particular, if f is SOS, this so-called Gram matrix representation (4.5) is guaranteed to exist with (Reznick, 1978) (4.6) The exponent set obtained with this Newton polytope reduction can be simplified further using more general facial reduction techniques Permenter & Parrilo, 2014a,b;Waki & Muramatsu, 2010). These techniques analyze the support of f in order to remove redundant ele-ments from B, and construct a smaller exponent set for which (4.5) is guaranteed to hold as long as f is SOS. It is clear that SOS polynomials are nonnegative globally. The converse is true only for univariate polynomials (n = 1, d arbitrary), quadratic polynomials (d = 1, n arbitrary), and bivariate quartics (n = 2, d = 2) (Hilbert, 1888). In general, therefore, being SOS is only a sufficient condition for global nonnegativity, and there are wellknown examples of nonnegative polynomials that are not SOS, such the Motzkin polynomial (Motzkin, 1967). However, while verifying polynomial nonnegativity is an NPhard problem (Murty & Kabadi, 1987), checking whether a polynomial f is SOS can be done in polynomial time by solving an SDP. Specifically, for each exponent α ∈ B + B, let A α ∈ S |B| be the symmetric binary matrix satisfying and observe that Then, condition (4.5) holds if and only if Q, A α = f α for all α ∈ B + B and we conclude that (4.9) The condition on the right-hand side defines an SDP, so a positive semidefinite Gram matrix Q certifying that f is SOS can (in principle) be constructed in polynomial time.

SOS polynomial matrices and SDPs
Let R[x] r×s n,d be the space of r×s matrices whose entries are n-variate polynomials of degree d. We say that a symmetric polynomial matrix P ∈ R[x] r×r n,2d is positive semidefinite (resp. definite) globally if P (x) 0 (resp. P (x) 0) for all x ∈ R n . We also say that P is positive semidefinite locally on a set K if the same conditions hold for x ∈ K, but not necessarily otherwise.
A symmetric polynomial matrix P ∈ R[x] r×r n,2d is called SOS if there exists an integer s and a polynomial matrix M ∈ R[x] s×r n,d such that (4.10) The set of r × r SOS polynomial matrices with entries in R[x] n,2d will be denoted by Σ r n,2d . All SOS polynomial matrices are clearly positive semidefinite globally, and the converse is true in the univariate case (n = 1); see Aylward et al. (2007) for a recent proof.
It is well known (see, e.g., Gatermann & Parrilo, 2004;Kojima, 2003;Parrilo, 2013) that a symmetric polynomial matrix P ∈ R[x] r×r n,2d is SOS if and only if it admits a Gram 19 matrix representation in the form for some exponent set B ⊆ N n d and some positive semidefinite symmetric matrix Q ∈ S r|B| + . One may always take B = N n d , and smaller exponent sets can be constructed with the same reduction techniques used for SOS polynomials. As in the scalar case (r = 1), condition (4.11) defines a set of affine constraints on Q, so verifying that a polynomial matrix is SOS amounts to solving an SDP.

Sparse SOS decompositions
A major obstacle to constructing SOS certificates of global polynomial nonnegativity via semidefinite programming is that the matrix Q is both dense and very large. If f ∈ R[x] n,2d has dense support supp(f ) = N n 2d , then one must take B = N n d and Q is a n+d d × n+d d dense matrix. Often, however, the support of f is small, i.e., |supp(f )| is much smaller than n+2d 2d . This property, called term sparsity (Wang et al., 2019(Wang et al., , 2021a(Wang et al., ,b, 2020a, can be exploited in various ways to reduce the computational complexity of the SDP in (4.9).
The facial reduction techniques mentioned above, which replace the full exponent set N n d with a (sometimes significantly) smaller subset, are arguably the simplest way to exploit term sparsity. However, as the next example demonstrates, they are often not sufficient.
contains only 485 out of the 50+4 4 = 316 251 possible monomials, so f is term sparse. However, it is not hard to check that the Newton polytope New(f ) consists of all points ξ ∈ R 50 + with ξ 1 = 4, so the Newton-reduced exponent set B = 1 2 New(f ) ∩ N 50 2 = N 50 2 \ N 50 1 contains all homogeneous exponents of degree 2. Therefore, Newton polytope reduction removes only 50+1 1 = 51 of the possible 50+2 2 = 1326 in the full set N 50 2 , and the SDP in (4.9) still involves a 1275 × 1275 Gram matrix Q.
Techniques to exploit term sparsity beyond what can be achieved with facial reduction methods alone are clearly desirable. Section 4.2.1 describes a general strategy to search for sparse SOS decompositions, which is based on the same matrix decomposition approach used to tackle large-scale sparse SDPs in Section 3. Sections 4.2.2 and 4.2.3 show that different types of sparse SOS decompositions proposed in the literature are particular cases of this general approach. Section 4.2.5 outlines how these methods can be extended to prove polynomial nonnegativity on basic semialgebraic sets, rather than globally. Throughout, B will denote a fixed set of candidate exponents for the SOS decomposition of a polynomial f , gener-ated from N n d using facial reduction or any other exponent selection technique.

General approach
Let A be a small subset of N n 2d and f be a term-sparse polynomial supported on A. To reduce the cost of testing if f is SOS, a natural idea is to check whether f belongs to a subset of the sparse SOS cone Σ[A] that admits a semidefinite representation with low computational complexity. Such a subset can be constructed using a simple strategy: prescribe a sparsity graph G(B, E) for the Gram matrix Q and impose its positive semidefiniteness through matrix decomposition.
Precisely, let G(B, E) be a graph with maximal cliques C 1 , . . . , C t and with edge set E ⊆ B × B satisfying (4.13) Consider the cone of sparse SOS polynomial whose Gram matrix Q has sparsity graph G and admits the clique-based positive semidefinite decomposition (4.14) We denote this cone by (4.16) If the cliques of the prescribed sparsity graph are small, the right-hand side is an SDP with small semidefinite cones and can be solved more efficiently than (4.9).
Remark 4.1 (Chordality of the sparsity graph). The Gram matrix decomposition (4.14) is motivated by the chordal decomposition result in Theorem 2.1. However, we do not assume here that the sparsity graph G(B, E) is chordal, so (4.14) is generally not equivalent to requiring Q ∈ S |B| + (E, 0). The lack of chordality makes searching for the maximal cliques C 1 , . . . , C t an NP-hard problem (Tomita et al., 2006). Allowing for nonchordal graphs with small cliques that can be determined analytically, however, can be extremely useful when a chordal extension leads to unacceptably large cliques even if it is approximately minimal. Examples of this situation can be found in works by Nie & Demmel (2009) and Kočvara (2020).
Remark 4.2 (Sparse SOS decompositions). Given a sparsity graph G(B, E), the cone Σ[A; E] ⊂ Σ[A] contains special SOS polynomials that admit a sparse SOS decomposition, i.e., a decomposition into a sum of sparse SOS polynomials. Indeed, substituting (4.14) into (4.5) yields . (4.17) Each polynomial σ k (x) is SOS because S k is positive semidefinite, and is sparse because the operation E C k x B extracts a subset of the full monomial vector x B .
It is important to observe that the reduction in computational complexity granted by the clique-based decomposition (4.14) usually comes at the expense of conservatism. This is because sparsity in the support set A does not guarantee the existence of a sparse Gram matrix Q.  1 − 2x 1 x 2 + 3x 2 2 − 2x 2 1 x 2 + 2x 2 1 x 2 2 − 2x 2 x 3 + 6x 2 3 + 18x 2 2 x 3 − 54x 2 x 2 3 + 142x 2 2 x 2 3 , and set A = supp(f ). Let B ⊂ N 3 2 be the exponent set such that x B = (x 1 , x 1 x 2 , x 2 , x 3 , x 2 x 3 ), which is obtained via Newton polytope reduction. Consider also the (chordal) sparsity graph G(B, E) shown in Figure 4.1, which satisfies (4.13). We claim that f belongs to Σ[A] but not to Σ[A; E]. To see this, observe that any Gram matrix representation of f must take the form where α ∈ R can be chosen arbitrarily. Setting α = 1 makes the Gram matrix Q positive semidefinite, so f ∈ Σ[A]. However, f cannot be in Σ[A; E] because this would require α = 0, for which Q is not positive semidefinite.

Correlative sparsity
The general approach presented in Section 4.2.1 requires specifying the sparsity graph for the Gram matrix Q in (4.5). A natural strategy to do this, pioneered by Waki et al. (2006) and Lasserre (2006), is to consider the couplings between any two independent variables x i and x j in a polynomial f supported on A. Two variables x i and x j are considered coupled if a monomial in the vector x A depends on both simultaneously, i.e., if there exists α ∈ A with α i α j > 0. These couplings can be described using the correlative sparsity (csp) graph of the support set A (or, alternatively, of the polynomial f ), which has vertices {1, . . . , n} and edge set Correlatively sparse SOS decompositions are obtained upon imposing that the entry Q γ,β of the Gram matrix in (4.5) vanishes if the monomial x β+γ introduces couplings between variables that are not consistent with the csp graph of f . This amounts to requiring that Q has sparsity graph G csp (B, E csp ) with edge set One may consider G(B, E csp ) a "hypergraph" with |B| nodes, built from the csp graph of f (which has n nodes) to ensure that polynomials (x B ) T Q x B with Q ∈ S |B| (E csp , 0) inherit the correlative sparsity of the original support set A. Unsurprisingly, therefore, the properties of G(B, E csp ) can be inferred from those of the (usually much smaller) csp graph. In the following statement, which can be proved using arguments similar to those given by Zheng (2019, Section 2.4.3), nnz(β) denotes the indices of the nonzero entries of an exponent β.
Proposition 4.1. Suppose that the csp graph of the support set A has maximal cliques J 1 , . . . , J t . Then, G(B, E csp ) has maximal cliques C k = {β ∈ B : nnz(β) ⊆ J k } for k = 1, . . . , t. Moreover, if the csp graph of A is chordal, then so is G(B, E csp ).
Proposition 4.1 considerably simplifies the construction of the "inflation" matrices E C k in (4.14), because it suffices to find the maximal cliques of the csp graph of A without building the (much larger) graph G(B, E csp ). In addition, it is not difficult to check that the operation E C k x B extracts monomials that depend only on variables indexed by J k . Using (4.17), one concludes that exploiting correlative sparsity amounts to searching for a sparse SOS decomposition in the form Example 4.3. The quartic polynomial f in (4.12) is correlatively sparse, and the csp graph of its support is chordal with maximal cliques J i = {i, i + 1, i + 2} for i = 1, . . . , n − 2. It is clear that f admits a sparse SOS decomposition (4.20) and this can be searched for by solving the SDP in (4.16). Since, for each clique J i , only six elements in B = N n 2 \ N n 1 can be multiplied together without introducing spurious couplings to different cliques, this SDP has semidefinite matrix variables S 1 , . . . , S n−2 ∈ S 6 + . Its computational complexity is clearly much lower than the corresponding dense formulation in Example 4.1, and a sparse SOS decomposition for f can be found in less than one second on a standard laptop.
Its csp graph, shown in where the set of exponents obtained with Newton polytope reduction is B = N 4 2 , is shown in Figure 4.2(b) and has cliques C 1 , . . . , C 4 containing 6 elements each, which are determined using Proposition 4.1. The sparsity pattern of the Gram matrix Q induced by G(B, E csp ) and the cliquebased matrix decomposition in (4.14), also illustrated in the figure, replaces a 15×15 positive semidefinite contraint on Q with four semidefinite constraints on 6 × 6 matrices S 1 , . . . , S 4 . According to (4.20), searching for these matrices is equivalent to looking for a sparse SOS decomposition f = σ 1 (x 1 , x 2 ) + σ 2 (x 2 , x 3 ) + σ 3 (x 3 , x 4 ) + σ 4 (x 4 , x 1 ). Such a decomposition is not guaranteed to exist even if f were SOS, but it does for this example with This proves that f ∈ Σ[supp(f ); E csp ]. 22

TSSOS, chordal-TSSOS and related hierarchies
Fix an exponent set A ⊂ N n 2d and a polynomial f with supp(f ) ⊆ A. Correlative sparsity exploits only the sparse couplings between variables as encoded by the csp graph of A, but does not take into account any further structure of A. This is not efficient when |A| is much smaller than n+2d 2d , so f is term-sparse, but the csp graph is fully connected or nearly so.
For this reason, Wang et al. (2019Wang et al. ( , 2021a introduced the term-sparse-SOS (TSSOS) and the chordal-TSSOS decomposition hierarchies, which exploit term sparsity irrespective of whether f is correlatively sparse. These are two particular examples of a broader family of possible sparsity-exploiting SOS decomposition hierarchies, each of which is obtained upon imposing the clique-based Gram matrix decomposition (4.14) for a sequence {G(B, E k )} k≥1 of increasingly connected sparsity graphs (E k ⊆ E k+1 ).
Irrespective of the particular hierarchy being considered (TSSOS, chordal-TSSOS, or another), the construction of such sparsity graphs begins with the observation that, in order to ensure (4.13), each edge set E k should contain at least all edges (β, γ) with β + γ ∈ A. This guarantees that A ⊆ supp((x B ) T Q x B ) for any Gram matrix Q defined via the clique-based decomposition (4.14), which is necessary for the feasibility of the SDP in (4.16). One should also not force diagonal entries Q ββ of the Gram matrix to vanish, because this amounts to saying that the monomial x β is redundant and β could be removed from the exponent set B. For these reasons, we define an initial exponent set B 0 and an initial edge set E 0 as Next, consider an extension operator E : B × B → B × B, which extends a given edge set E ⊂ B × B according to a given rule. The edge sets E 1 ⊆ E 2 ⊆ · · · ⊆ E k ⊆ · · · and their corresponding support sets B k are defined using the iterative rule (4.22a) Different extension operators produce different types of sparse SOS decomposition hierarchies. In particular: • If E is a block-completion operator that completes all connected components of the edge set {(β, γ) ∈ B × B : β +γ ∈ B k−1 }, one recovers the TSSOS hierarchy (Wang et al., 2019(Wang et al., , 2021b. At each step of the hierarchy, Q has chordal sparsity (specifically, a block-diagonal structure) and (4.14) is equivalent to imposing Q ∈ S |B| + (E k , 0). • If E is an approximately minimal chordal extension operator that extends the edge sets {(β, γ) ∈ B × B : β + γ ∈ B k−1 } such that G(B, E k ) is chordal, one recovers the chordal-TSSOS hierarchy (Wang et al., 2021a). At each step of the hierarchy, Q has chordal sparsity and (4.14) is equivalent to requiring Q ∈ S |B| + (E k , 0). In both cases, the edge extensions are performed on a graph with |B| nodes and the maximal cliques of G(B, E k ) must be found at each iteration. This is unlike the correlative sparsity strategy in Section 4.2.2, where the maximal cliques of G(B, E csp ) are built from those in the csp graph of A, which has only n nodes (cf. Proposition 4.1).
It is also clear that the choice of extension operator determines the computational complexity of the resulting sparse SOS decomposition hierarchy, as well as the gap between Σ[A, E * ] and Σ[A]. For example, the chordal-TSSOS hierarchy has a lower complexity than the TSSOS one in general, as its sparsity graphs have fewer edges (see Wang et al., 2021a,b for detailed complexity estimates). However, the TSSOS hierarchy has a higher representation power because Σ[A, E * ] = Σ[A], which is generally not true for the chordal-TSSOS hierarchy.
is term sparse but not correlatively sparse, since its csp graph is a complete graph with three nodes. The candidate exponent set to search for an SOS decomposition of f is B = N 3 2 , as Newton polytope reduction removes no terms. For convenience, we order B such that The TSSOS hierarchy yields the sparsity graphs shown in Figure 4.3, which stabilize at the second iteration (k = 2). The corresponding sparsity patterns of the Gram matrix Q are also shown in that figure. Observe how the connected components of the initial graph G(B, E 0 ) are completed at the first iteration to obtain the graph G(B, E 1 ). As discusses in Remark 4.4, the stabilized blockdiagonal structure of Q coincides with the partition of x B into the groups {x 2 3 , x 2 2 , x 2 1 , 1, x 2 x 3 }, {x 1 }, {x 3 , x 2 } and {x 1 x 2 , x 1 x 3 } implied by the sign symmetries of f , which is invariant under the transformations (x 1 , x 2 , x 3 ) → (−x 1 , −x 2 , −x 3 ) and (x 1 , x 2 , x 3 ) → (x 1 , −x 2 , −x 3 ) (the four groups of monomials are invariant under both, the first, the second, and none of these transformations). In this example, the SDP in (4.16) is feasible at all iterations of the TSSOS hierarchy because f admits the positive semidefinite Gram matrix representation and the Gram matrix is consistent with the sparsity graphs in Figure 4.3. Thus, all steps of the TSSOS hierarchy are able to prove that f is SOS. Note that this can be guaranteed a priori only for the last step by virtue of Theorem 4.1.
For the same polynomial f , the chordal-TSSOS hierarchy stabilizes at the first iteration (k = 1) and yields the sparsity graph shown in Figure 4.4. The corresponding Gram matrix Q is sparser than those encountered in the TSSOS hierarchy, leading to smaller semidefinite constraints in (4.16). Again, this SDP is feasible in light of the Gram matrix decomposition given above, so the chordal-TSSOS hierarchy is able to prove that f is SOS. This, however, cannot be guaranteed a priori.
Remark 4.5. The explicit Gram matrix decomposition in Example 4.5 reveals that the smaller monomial basis x B = (x 2 3 , x 2 2 , x 2 1 , x 2 x 3 , 1) would suffice to construct an SOS decomposition of f . It remains to be seen whether this reduced basis can be identified using strategies that are more sophisticated than the Newton polytope reduction.

Correlatively term-sparse hierarchies
The sparse SOS decomposition hierarchies described in Section 4.2.3 can be combined with the correlative sparsity techniques outlined in Section 4.2.2 in a natural way. Let E k be the edge set obtained using the iterations in (4.22) for a given extension operator, and let E csp be the edge set in (4.19) constructed using correlative sparsity. Then, the sequence of sparsity graphs yields a hierarchy of "correlatively term-sparse" SOS decompositions, which exploit simultaneously term and correlative sparsity. Since E k ⊆ E k+1 by construction, and since the sequence {E k } stabilizes onto an edge set E * in a finite number of steps, the sparse SOS cones corresponding to this hierarchy satisfy and all inclusions are generally strict. Note also that one may remove from B all exponents that violate the correlative sparsity before constructing the edge sets E k , because the intersection with E csp eliminates all edges between such exponents (including self-loops). When the extension operator used to build E k is the block-completion operation used in the TSSOS hierarchy, the sparsity graphs in (4.24) yield exactly the CS-TSSOS hierarchy introduced by Wang et al. (2020a). In this case, by Theorem 4.1, the stabilized sparsity graph G(B, E * ∩ E csp ) simply encodes sign symmetries and correlative sparsity. Since exploiting sign symmetries in SOS decompositions brings no conservatism , one immediately obtains the following corollary.
The CS-TSSOS sparsity graphs obtained with (4.24) and the corresponding sparsity patterns for the Gram matrix Q of f , illustrated in Figure 4.5, are chordal. The hierarchy stabilizes after three steps. At the first step, the clique-based decomposition (4.14) replaces the 22 × 22 semidefinite constraint on the Gram matrix Q with six semidefinite constraints of size 2, 2, 2, 10, 4 and 5. The first step of the TSSOS hierarchy, instead, leads to an SDP with five semidefinite constraints of size 2, 2, 2, 10, 7 (Wang et al., 2020a, Example 3.4). The second iteration of the CS-TSSOS hierarchy produces significant fill-in, and the size of the largest semidefinite constraint increases to 15. The third iteration brings only minimal additional fill-in. At this final stage, the connected components of the sparsity graph correspond to a partition of the monomials x B according to the sign symmetry of f (the first four monomials in x B are not invariant under the symmetry transformation, while the rest are), but the correlative sparsity Figure 4.5: Sparsity graphs and corresponding matrix sparsity patterns for the first (top), second (middle) and third (bottom) iterations of the CS-TSSOS hierarchy in Example 4.6. After that, the hierarchy stabilizes. Graph vertices are labelled by monomials in x B instead of the corresponding exponents in B to ease the visualization. Colors mark the maximal cliques; multicolor vertices and matrix entries belong to multiple cliques.
prevents the completion of the largest connected component, i.e., of the bottom-right connected matrix block in Figure 4.5(c).
Numerical solution of the SDP (4.16) shows that all steps of the CS-TSSOS hierarchy are feasible, so an SOS decomposition of the polynomial f in (4.25) can be constructed at a lower computational cost than any other hierarchy discussed in this work. Note that feasibility cannot be guaranteed a priori at any step of the hierarchy, even the last (stabilized) one, due to the conservative nature of correlatively sparse SOS decomposition (see Remark 4.3).

Sparse SOS decompositions on semialgebraic sets
The sparsity-exploiting methods to construct SOS decompositions described so far prove global polynomial nonnegativity, but can be extended to establish local nonnegativity on a basic semialgebraic set defined by m polynomial inequalities, K := {x ∈ R n : g 1 (x) ≥ 0, . . . , g m (x) ≥ 0}. (4.26) Set g 0 (x) ≡ 1 for convenience. To verify that f ∈ R[x] n,d (not necessarily of even degree) is nonnegative on K, it suffices to find an integer ω (known as the relaxation order ) and exponent sets B 0 , . . . , B m ⊆ N n ω such that As before, these conditions define the feasible set of an SDP. Generally, one chooses ω such that 2ω ≥ max{deg(f ), deg(g 1 ), . . . , deg(g m )} and then takes where a is the smallest integer greater than or equal to a. This ensures that each term in the sum in (4.27) is a polynomial of degree at most 2ω. One can allow for 2ω > deg(f ) because cancellations may occur when summing all terms.
Since each polynomial σ i (x) := (x Bi ) T Q i x Bi in (4.27) is SOS, this condition gives a weighted SOS decomposition of f , meaning a representation of f as a weighted sum of SOS polynomials where the weights are g 0 = 1 and the polynomials g 1 , . . . , g m appearing in the semialgebraic definition of K. Remarkably, this sufficient condition for local nonnegativity is also necessary if f is strictly positive on K and this is a compact set satisfying the so-called Archimedean condition.
If f and the polynomials g 1 , . . . , g m are term sparse, one can proceed as in Section 4.2.1 and attempt to reduce the computational complexity of the generic weighted SOS decomposition (4.27) by requiring the matrices Q 0 , . . . , Q m to be sparse and to admit a clique-based positive semidefinite matrix decomposition. The only new aspect is that one must consider how the sparse polynomial (x Bi ) T Q i x Bi interacts with the corresponding g i in order to determine the overall structure of the sum on the right-hand side of (4.27). This requires some care, especially if one hopes to recover sparse versions of Theorem 4.2.
To give an example of this general strategy, let us explain how to extend the correlative sparsity technique out-lined in Section 4.2.2. In this case, one replaces the csp graph of f constructed with the joint csp graph of the polynomials f, g 1 , . . . , g m , which has vertices {1, . . . , n} and an edge between vertices i and j if at least one of the following conditions hold: (a) The variables x i and x j are multiplied together in f ; (b) At least one of g 1 , . . . , g m depends on both x i and x j , even if these variables are not multiplied together.
The different treatment of f and g 1 , . . . , g m reflects the asymmetric role these polynomials play in (4.27). Then, one imposes that each matrix Q i in (4.27) is the densest possible matrix such that the support of g i (x)(x Bi ) T Q i x Bi is consistent with the joint csp graph. Precisely, let J 1 , . . . , J t be the maximal cliques of the joint csp graph and, for each i = 0, . . . , m, let var(g i ) ⊂ {1, . . . , n} be the set of indices of the variables on which g i depends, with the convention that var(g 0 ) = var(1) = ∅. By condition (b) above, there is at least one clique J k such that var(g i ) ⊆ J k and we denote the set of clique indices k for which this holds by One can check that G i (B i , E i ) is chordal if so is the joint csp graph of f, g 1 , . . . , g m . Moreover, it has maximal cliques C i,1 , . . . , C i,|Ni| with C i,k := {β ∈ B i : nnz(β) ⊆ J k }. Consequently, the clique-based positive semidefinite decomposition of Q i reads (4.30) If the cliques C i,k are small, imposing this clique-based decomposition for each i = 0, . . . , m in (4.27) allows one to search for a weighted SOS decomposition of f by solving an SDP with low computational complexity. Moreover, arguing as in Remark 4.2, one concludes that this process yields the representation where each SOS polynomial σ i,k depends only on variables indexed by a single clique of the joint csp graph. Crucially, the following sparse version of Theorem 4.2 guarantees that such a sparse weighted SOS decomposition exists if the joint csp graph is chordal, the semialgebraic definition of the set K in (4.26) includes inequalities of the form r 2 k − x J k Theorem 4.3 (Grimm et al., 2007;Lasserre, 2006). Let f be a polynomial that is strictly positive on a basic semialgebraic set K = {x ∈ R n : g 1 (x) ≥ 0, . . . , g m (x) ≥ 0}, whose definition includes the inequalities r 2 k − x J k 2 ≥ 0 for some constants r 1 , . . . , r t and all k = 1, . . . , t. If the joint csp graph of f, g 1 , . . . , g m is chordal, f has a sparse weighted SOS decomposition in the form (4.31).
Remark 4.6. The assumption that the semialgebraic definition of K includes the inequalities r 2 k − x J k 2 2 ≥ 0 can be weakened by requiring that the |J k |-dimensional set satisfies the Archimedean condition for each k = 1, . . . , t. Moreover, the assumption is mild when K is compact because, in principle, the inequalities r 2 k − x J k 2 ≥ 0 can be added with values of r k large enough not to change the set K. Proving that K remains unchanged for candidate r k , however, may not be easy in practice.
The TSSOS, chordal-TSSOS and CS-TSSOS hierarchies can also be extended to produce weighted SOS decomposition on basic semialgebraic sets (Wang et al., 2021a(Wang et al., ,b, 2020a. Interested readers are referred to these works for the details. Here, we simply observe that, just like their global counterparts described in Sections 4.2.3 and 4.2.4, these extended hierarchies stabilize after a finite number of steps. Upon stabilization, moreover, the extended TSSOS and CS-TSSOS hierarchies recover the blockdiagonal structure of the matrices Q 0 , . . . , Q m implied by joint sign symmetries of the polynomials f, g 1 , . . . , g m (see Wang et al., 2021b, Theorem 6.5 and Corollary 6.8; Wang et al., 2020a, Proposition 3.10). This observation can be combined with a symmetry-exploiting version of Theorem 4.2 (Riener et al., 2013, Theorem 3.5) and with Theorem 4.3 to conclude that the TSSOS and CS-TSSOS hierarchies are guaranteed to work for term-sparse polynomials that are strictly positive on compact sets whose semialgebraic definition satisfies suitable versions of the Archimedean condition.

Decomposition of sparse polynomial matrices
Having studied sparsity-exploiting techniques to reduce the complexity of searching for SOS representations for term-sparse polynomials, we now switch gear and review how chordal sparsity can be exploited when looking for SOS representations of sparse polynomial matrices. Section 4.3.1 presents results by  that partially extend the classical chordal decomposition theorem (Theorem 2.1) to SOS polynomial matrices with chordal sparsity. Decomposition results giving SOS certificates of matrix positivity on semialgebraic sets are briefly outlined in Section 4.3.2. All of these results are useful for static output controller design (Henrion & Lasserre, 2006), robust stability region analysis , and stability analysis of time-delay systems (Peet et al., 2009). Note that all results presented in this section consider the structural sparsity of polynomial matrices, not their term sparsity. In principle, one could exploit both structural and term sparsity by combining the results reviewed below with those of Section 4.2.

Global decomposition
Consider a symmetric n-variate polynomial matrix P ∈ R[x] r×r n,2d of degree 2d whose (structural) sparsity pattern is described by a chordal graph G({1, . . . , r}, E), i.e., (4.32) Since checking whether P is positive semidefinite globally via the SOS certificates described in Section 4.1.2 is expensive when r is large, we seek to exploit the sparsity of P and replace one large matrix SOS constraint with multiple smaller ones. Let C 1 , . . . , C t be the maximal cliques of G. If P is positive semidefinite globally, then applying Theorem 2.1 for each x ∈ R n reveals that there exists x-dependent positive semidefinite matrices S k : R n → S However, this decomposition is not immediately useful in practice because the matrices S k need not be polynomial, so they cannot be searched for using SOS methods. As an example, consider whose sparsity graph is a simple three-node chain graph with two maximal cliques, C 1 = {1, 2} and C 2 = {2, 3}.  proved that this matrix is positive definite globally, but does not admit a chordal decomposition (4.33) with polynomial S 1 and S 2 . Using this example, and recalling that all positive semidefinite univariate polynomial matrices are also SOS, one can prove the following general statement.
Proposition 4.3 . Let G be a connected and not complete chordal graph with r ≥ 3 vertices and maximal cliques C 1 , . . . , C t . For any positive integers n and d, there exists a positive definite SOS matrix P ∈ Σ r n,2d with sparsity graph G that does not admit a decomposition (4.33) with polynomial matrices S 1 , . . . , S t .
On the other hand, the direct proof of Theorem 2.1 given by Kakimura (2010) can be combined with a diagonalization procedure for polynomial matrices due to Schmüdgen (2009) to show that (4.33) holds with SOS matrices S 1 , . . . , S k for all positive semidefinite polynomial matrices, up to multiplication by an SOS polynomial.
Theorem 4.4 . Let P ∈ R[x] r×r n,2d be positive semidefinite and let C 1 , . . . , C t be the maximal cliques of its sparsity graph. There exist ν ∈ N, an SOS polynomial σ ∈ Σ n,2ν , and SOS polynomial matrices S k ∈ Σ |C k | n,2d+2ν for k = 1, . . . , t, such that When the maximal cliques of the sparsity graph of P are small, this result enables one to construct an SOS certificate of global positive semidefinitess using small matrix SOS constraints, which have a much lower computational complexity than simply requiring σP to be SOS. Implementation of the chordal SOS decomposition in Theorem 4.4 using SDPs requires the matrix P to be fixed, because the SOS weight σ must be determined alongside the SOS matrices S 1 , . . . , S t . Often, however, P depends on a vector of parameters λ ∈ R that must be optimized whilst ensuring that P is positive semidefinite. In these cases, condition (4.34) is not jointly convex in λ and σ, so the latter must be fixed a priori. This is generally restrictive because, when σ is fixed arbitrarily, Proposition 4.3 implies that the decomposition (4.34) may not exist. However, one can prove a sparse-matrix version of Reznick's Positivstellensatz (Reznick, 1995) to conclude that the weight σ(x) = x 2ν 2 is guaranteed to work at least when P is a homogeneous positive definite matrix.
Theorem 4.5 . Let P ∈ R[x] r×r n,2d be homogeneous of degree 2d and positive definite on R n \ {0}. Let C 1 , . . . , C t be the maximal cliques of the sparsity graph of P . There exist ν ∈ N and SOS polynomial matrices S k ∈ Σ |C k | n,2d+2ν for k = 1, . . . , t, such that Decomposition results such as this, where the SOS weight σ is fixed, are of considerable interest because they enable the construction of convergent hierarchies of sparsity-exploiting SOS relaxations for optimization problems with global polynomial matrix inequalities (see Henrion & Lasserre, 2006and Peet et al., 2009. To illustrate the idea, let us consider the generic convex minimization problem (4.35) where b : R → R is a convex cost function and P 0 , . . . , P ∈ R[x] r×r n,2d are symmetric polynomial matrices whose sparsity graph is chordal and has maximal cliques C 1 , . . . , C t . Given any integer ν ≥ 0, a feasible vector λ and an upper bound on the optimal cost b * may be found by solving the SOS relaxation (4.36) which can be reformulated as a standard-form SDP. If the polynomial matrices P 0 , . . . , P are homogeneous of even degree, and there exists λ 0 ∈ R such that P (x, λ 0 ) is positive definite, then one can use Theorem 4.5 to prove that b * ν → b * from above as ν → ∞; see  for more details and numerical examples. Under further technical assumptions (see  for details), asymptotic convergence when P 0 , . . . , P are not homogeneous is preserved by replacing the SOS multiplier x 2ν 2 with (1 + x 2 2 ) ν .

Decomposition on a semialgebraic set
We now turn our attention to sparse polynomial matrix inequalities on a semialgebraic set K defined as in (4.26) by m polynomial inequalities g i (x) ≥ 0, i = 1, . . . , m. A sufficient condition for a symmetric polynomial matrix P ∈ R[x] r×r n,d to be positive semidefinite on K is that there exist an integer ν ∈ N and SOS matrices S 0 , . . . , S m ∈ Σ m n,2ν such that (4.37) A matrix version of Putinar's Positivstellensatz proved by Scherer & Hol (2006) states that this condition (4.37) is also necessary when P is positive definite on K and this set satisfies the Archimedean condition. The weighted matrix SOS decomposition (4.37) can be searched for with semidefinite programming, but this is prohibitively expensive when P is large. If it has chordal structural sparsity, however, one can show that the SOS matrices S i admit a clique-based decomposition. This yields the following sparse matrix version of Putinar's Positivstellensatz.
Theorem 4.6 . Let K be a semialgebraic set defined as in (4.26) that satisfies the Archimedean condition. Suppose that the symmetric polynomial matrix P ∈ R[x] r×r n,d is positive definite on K and that its sparsity graph has maximal cliques C 1 , . . . , C t . There exist an integer ν ∈ N and SOS matrices S i,k ∈ Σ |C k | n,2ν for i = 0, . . . , m and k = 1, . . . , t such that This result can be used to construct sparsity-exploiting SOS relaxations of optimization problems with polynomial matrix inequalities on compact semialgebraic sets that satisfy the Archimedean condition. For example, consider an optimization problem analogous to (4.35), where the polynomial matrix inequality is enforced on K rather than on the full space R n , and denote its optimal value by b * . If there exists λ 0 ∈ R such that the inequality is strict on K and this set satisfies the Archimedean condition, then the optimal value of the SOS problem n,2ν for i = 0, . . . , m and k = 1, . . . , t converges to b * from above as ν → ∞. Interested readers are referred to  for more details and computational examples.

Other approaches
The scalability of SOS approaches to polynomial inequalities and polynomial optimization problems can be improved using techniques beyond those described in this section. One example is to replace semidefinite conditions on a large Gram matrix with stronger conditions based on factor-width-k decompositions, which are discussed in Section 5.2 below. For the particular case of k = 2, one obtains scaled diagonally dominant SOS (SD-SOS) certificates of nonnegativity (Ahmadi & Majumdar, 2019). Another approach is to use bounded-degree SOS conditions (Lasserre et al., 2017), in which (loosely speaking) one restrict the degree of the monomial basis x B used in the Gram matrix representation and handles monomials of higher degree using positivity certificates that can be reformulated as linear programs. Term sparsity can be exploited in these frameworks, too: the relation between correlative sparsity and SDSOS conditions is discussed by Zheng et al. (2019a), while Weisser et al. (2018) develop sparsity-exploiting bounded-degree SOS hierarchies.
Finally, when working with polynomials that are invariant under groups of symmetry transformations, a large Gram matrix can be replaced with one that has a block-diagonal structure using symmetry reduction techniques (Gatermann & Parrilo, 2004;Riener et al., 2013). The block-diagonalization based on signsymmetries, recovered by the TSSOS and CS-TSSOS hierarchies discussed in Sections 4.2.3 and 4.2.4, is only one particular example; more sophisticated strategies require using a "symmetry-adapted" basis for the space of polynomials in lieu of the monomial basis x B .

Factor-width decomposition
We have seen that the matrix decomposition approach can lead to significant efficiency improvements in the solution of sparse SDPs (cf. Section 3) and sparse polynomial optimization problems (cf. Section 4). We now turn our attention to the problem of testing positive-semidefinitess of matrices that are not necessarily sparse, for which similar matrix decomposition ideas can also be leveraged using approximation methods. This class of methods is known as factor-width decomposition (Boman et al., 2005). We will highlight its connections and differences with the chordal decomposition reviewed above.
After reviewing some background in Section 5.1, we discuss how a hierarchy of inner and outer approximations for positive semidefinite matrices can be constructed based on factor-width-k matrices in Section 5.2. We then discuss in Section 5.3 how this can be extended further, leading to the notion of block factor-width-two matrices , which aims to strike a balance between numerical computation and approximation quality. Applications in semidefinite and SOS optimization are discussed in Sections 5.4 and 5.5.

Background
As emphasized in the previous sections, solving large-scale semidefinite programs is at the centre of many problems in control engineering and beyond, and the development of fast and reliable solvers has attracted significant attention recently, mainly focusing on sparsity exploiting and low-rank solution exploiting methods (De Klerk, 2010;Majumdar et al., 2020). Some of these methods attempt to solve the problem exactly using, e.g., chordal decomposition (cf. Sections 3 and 4) when sparsity is present, but others are trying to provide approximate solutions when these problems are large and dense. This section focuses on the latter case, i.e., the case of dense and large SDPs, and the general idea is still based on a certain matrix decomposition, similar to Sections 3 and 4.
One basic approach is to approximate the positive semidefinite cone S n + with the cone of factor-width-k matrices (Boman et al., 2005), which allows for a certain matrix decomposition discussed in Section 5.2 below. We will denote the cone of factor-width-k matrices by FW n k , where 29 n is the matrix dimension. The case k = 2 is of special interest: this is also the case of symmetric scaled diagonally dominant matrices, and enforcing FW n 2 is equivalent to a number of second-order cone constraints, which implies that linear functions can be optimized over FW n 2 by solving a second-order cone program (SOCP). Compared to SDPs, SOCPs are much more scalable but this approximation is very conservative: the restricted problem may even become infeasible. At the same time, attempting an approximation over FW n 3 will result into an O(n 3 ) number of positive semidefinite constraints, which may not strike a good balance between approximation and computational efficiency. For this reason, most work has focused on the case of factor-width-two matrices and on some closely related extensions (Ahmadi et al., 2017a;Ahmadi & Hall, 2017;Wang et al., 2021c).
This notion of factor-width-two matrices was recently extended to the block case by Zheng et al. (2019b), who showed that the approximation quality is significantly improved compared to FW n 2 and remains computationally feasible unlike the approximation using FW n 3 . At the same time, block factor-width-two matrices can form a new hierarchy of approximations using a "coarsening" of the decomposition results (cf. Definition 2.2). An alternative approach that results in an improved approximation is based on the use of decomposed structured subsets (Miller et al., 2019b).

Factor-width-k decompositions
We now introduce the concept of factor-width-k matrices, originally defined in Boman et al. (2005).
Definition 5.1. The factor width of a matrix X ∈ S n + is the smallest integer k such that there exists a matrix V where A = V V T and each column of V has at most k nonzeros.
The factor width of X is also the smallest integer k for which X is the sum of positive semidefinite matrices that are non-zero at most on a k × k principal submatrix: for some matrices Z i ∈ S k + , where C i is a set of k distinct integers from 1 to n and s = n k . We use FW n k to denote the set of n × n matrices with factor-width at most k. The dual of FW n k with respect to the normal trace inner product is The following hierarchy of inner/outer approximations of S n + follows directly from these definitions: FW n 1 ⊆ FW n 2 ⊆ . . . ⊆ FW n n = S n + = (FW n n ) * ⊆ . . . ⊆ (FW n 2 ) * ⊆ (FW n 1 ) * .
( 5.2) The set FW n 2 is of particular interest because it is equivalent to the set of symmetric scaled diagonally dominant matrices (Boman et al., 2005). Furthermore, linear optimization over FW n 2 can be converted into an SOCP, for which efficient algorithms exist. The better scalability of SOCPs compared to SDPs makes inner approximations of positive semidefinite cones based on FW n 2 very attractive, and form the basis of the SDSOS framework for polynomial optimization proposed by Ahmadi & Majumdar (2019).
Remark 5.1 (Factor-width decomposition vs chordal decomposition). The decomposition (5.1) is formally the same as the chordal decomposition in Theorem 2.1, and the two differ only in the choice of "cliques" C 1 , . . . , C s . For chordal decomposition, they are the maximal cliques of (a chordal extension of) the sparsity graph of Z. For factor-width-k decomposition, instead, they are all n k sets of k distinct indices from {1, . . . , n}. These two different choices, however, have considerably different implications: while chordal decomposition is necessary and sufficient for a sparse matrix to be positive semidefinite, factor-widthk decomposition is only sufficient unless k = n. The quality of the approximation of positive semidefinite cones by (FW n k ) * was recently investigated by Song & Parrilo (2021) and Blekherman et al. (2020).

Block factor-width-two decomposition
The representation (5.1) reveals that checking whether a matrix Z belongs to FW n k for any values of n and k is equivalent to an SDP. When k < n, this SDP has smaller semidefinite cones than S n + , but may be more expensive than checking whether Z ∈ S n + directly because of the combinatorial number of cones, n k . Setting k = 2 does lead to efficiency gains, but the gap between FW n 2 and S n + might be unacceptably large in some applications. For this reason, block factor-width-two matrices are of interest. Recall from Section 2.3.1 the notion of a block-partition of a matrix Z ∈ S n subordinate to a partition α of n. Recall also the definition of the index matrix E C k ,α in (2.14). We here further define The set of block factor-width-two matrices, denoted by FW n α,2 , is defined as follows . Definition 5.2. For any partition α = {α 1 , . . . , α p } of n, a symmetric matrix Z ∈ S n belongs to the class FW n α,2 of block factor-width-two matrices if and only if for some X ij ∈ S αi+αj + , where E ij,α is defined in (5.3b).
It is clear that (5.4) is a direct block extension of (5.1) when k = 2. Also, it is not hard to check that FW n α,2 is a 30 cone. Its dual (with respect to the trace inner product) is characterized by the following proposition.
It should be clear from Definition 5.2 and Proposition 5.1 that semidefinite programming can be used to verify whether a matrix belongs to FW n α,2 or to (FW n α,2 ) * . While a gap between these cones and the positive semidefinite cone S n + remains, the next theorem states that the size of the gap can be reduced by coarsening the partition α (cf. Definition 2.2), generally at the expense of increasing the computational complexity of the semidefinite representations of FW n α,2 and (FW n α,2 ) * . This tradeoff between approximation gap and complexity is the main advantage of using block factor-width-two cones.
This result does not quantify how well FW n α,2 and (FW n α,2 ) * approximate the positive semidefinite cone. Such information is clearly not only of theoretical interest, but also of practical importance, especially for dense positive semidefinite cone that cannot be studied using chordal decomposition. Some progress in this direction was recently made by Zheng et al. (2019b), who leveraged results by Blekherman et al. (2020) to show that the normalized distance between either FW n α,2 or (FW n α,2 ) * and S n + is at most p−2 p , where p is the number of blocks in the partition α.
Compared to (5.2), one main advantage of the hierarchy of inner/outer approximations using block factor-widthtwo cones in Theorem 5.1 is that the number of basis matrices in the representation (5.4) remains O(p 2 ), instead of a combinatorial number n k . Moreover, the value of p decreases when coarsening the partition. Therefore, the cone FW n α,2 is often computationally more tractable than the cone FW n k with k ≥ 3.

Applications to semidefinite programming
Recall from Theorem 5.1 that the cones FW n α,2 and (FW n α,2 ) * approximate the positive semidefinite cone S n + from the inside and from the outside, respectively, and that the approximation improves as the partition α is coarsened. This allows one to compute convergent sequences of upper and lower bounds on the optimal value of an SDP in the primal standard form (3.1), which we denote by J * for simplicity, using optimization problems of increasing computational complexity that are always simpler to solve than (3.1) itself. Precisely, since FW n α,2 ⊆ S n + for any partition α of n, the optimal value of the block factor-width cone program bounds the optimal value of the SDP (3.1) from above. A complementary lower bound is given by because S n + ⊆ (FW n α,2 ) * . By Theorem 5.1, replacing α with a coarser partition can only improve these upper and lower bounds, and we have the following corollary.
When α = 1 = {1, . . . , 1} is the finest possible partition, problems (5.5) and (5.6) can be reformulated as SOCPs. This case was studied extensively by Ahmadi & Majumdar (2019), and numerical experiments show that the optimal values L 1 and U 1 can often be very poor bounds for J * . To obtain better results using coarser partitions, one can leverage the definition of FW n α,2 and rewrite the upper bound problem (5.5) as where C jl,α := E jl,α C(E jl,α ) T , A ijl,α := E jl,α A i (E jl,α ) T . This is a standard-form SDP and can be solved with general-purpose solvers. Observe that the number of equality constraints in this SDP is the same as for the original problem (3.1), but the dimension of semidefinite cones has been reduced. Since general-purpose SDP solvers can handle multiple small semidefinite cones much more efficiently than a single large one, problem (5.7) can often be solved much faster than (3.1). For instance, the numerical experiments in Zheng et al. (2019b) show that useful upper bounds U α on the optimal value of SDP relaxations of polynomial optimization problems can be found with a reduction of up to 80% in CPU time.

Applications to SOS optimization
Block factor-width-two decompositions can also be applied to reduce the computational cost of SOS optimization. As discussed in Section 4, an n-variate polynomial p ∈ R[x] n,2d of even degree 2d is SOS if and only if there exists an exponent set B ⊆ N n d and a positive semidefinite matrix Q such that (Parrilo, 2000) p(x) = (x B ) T Q x B . (5.8) The fundamental computational challenge in optimization over the cone Σ n,2d of n-variate SOS polynomials of degree at most 2d is that the parameterization (5.8) requires in general an N × N positive semidefinite matrix with N = n+d d . This may be prohibitive even for moderate values of n and d.
For polynomials characterized by term sparsity, the computational complexity can be reduced dramatically using the approaches reviewed in Section 4, which are based on chordal decomposition. To handle polynomials that are not term sparse, Ahmadi & Majumdar (2019) introduced the notion of scaled diagonally dominant sum-of-squares (SDSOS). These are special SOS poynomials whose Gram matrix Q in (5.8) belongs to the factor-width-two cone FW |B| 2 . As in the case of semidefinite programming, defining block-SDSOS polynomials by replacing FW |B| 2 with its superset FW |B| α,2 for any partition α of |B| offers an improved inner approximation of Σ n,2d .
Definition 5.3. Given a partition α = {α 1 , . . . , α g } of |B|, a polynomial p ∈ R[x] n,2d is said to be α-SDSOS if and only if there exists coefficient vectors f ij,t ∈ R αi+αj and exponent sets B ij ⊆ N n d such that The set of all α-SDSOS polynomials in n independent variables and degree no larger than 2d will be denoted by α-SDSOS n,2d . It is not difficult to check that it is a cone. Moreover, since definition (5.9) is considerably more structured that the definition (4.3) of general SOS polynomials, the inclusion α-SDSOS n,2d ⊆ Σ n,2d is immediate.
For the uniform unit partition α = {1, . . . , 1} of n+d d , the cone α-SDSOS n,2d reduces to the normal SDSOS cone studied by Ahmadi & Majumdar (2019). At the other hand of the spectrum, for any partition in the form α = {α 1 , α 2 } one has α-SDSOS n,2d = Σ n,2d . This second statement is a direct consequence of the following result, which reveals a connection between the polynomial cone α-SDSOS n,2d and the block factor-width-two cone FW |B| α,2 . Theorem 5.2 . A polynomial p ∈ R[x] n,2d belongs to the cone α-SDSOS n,2d if and only if it admits a Gram matrix representation (5.8) with B ⊆ N n d and Q ∈ FW |B| α,2 . Similar to Theorem 5.1, we can build a hierarchy of inner approximations for the SOS cone Σ n,2d .
Corollary 5.2. Let 1 = {1, . . . , 1}, α = {α 1 , . . . , α g }, β = {β 1 , . . . , β h } and γ = {γ 1 , γ 2 } be partitions of |B| such that α β. Then, SDSOS n,2d = 1-SDSOS n,2d ⊆ α-SDSOS n,2d ⊆ β-SDSOS n,2d ⊆ γ-SDSOS n,2d = Σ n,2d . (5.10) Consider now an optimization problem of the form (5.11) where p 0 , . . . , p t ∈ R[x] n,2d are given polynomials, w ∈ R t is a given cost vector, and u ∈ R t is the decision variable. Let α be any partition of n+d d . To compute an upper bound on the optimal cost w * , one can strengthen the nonnegavity constraint on p with the SOS constraints p ∈ Σ n,2d , the SDSOS constraint p ∈ SDSOS n,2d , or the block-SDSOS constraint p ∈ α-SDSOS n,2d . The first approach replaces (5.11) with an SDP, the second one leads to an SOCP, and the third yields a block-factor-width cone program that can be reformulated as a standardform SDP. According to Corollary 5.2, the SOS constraint provides the best upper bound on w * , but is the most computationally expensive. At the other extreme is the SDSOS constraint, which offers the fastest computations but may be too restrictive-in fact, the corresponding SOCP may even be infeasible. The block-SDSOS constraint p ∈ α-SDSOS n,2d , instead, can balance the computational speed and upper bound quality thanks to the freedom one has in choosing the partition α. This expectation is confirmed by the numerical experiments of Zheng et al. (2019b), but the problem of choosing an optimal partition for given computational resources remains an open problem.

Applications
The matrix decomposition techniques reviewed in the previous sections can be used to reduce the computational complexity of a wide variety of analysis and control problems that can be formulated as SDPs or SOS programs. As anticipated in Section 3.1, complex large-scale dynamical systems at the heart of modern technology often possess a natural graph-like structure, due for example to sparse interactions between subsystems in a network (Andersen et al., 2014a;Dall'Anese et al., 2013;Riverso et al., 2014;Zheng et al., 2018d. The key to enabling efficient numerical treatment of control problems for such systems is to devise SDP or SOS relaxations that preserve this graph structure as much as possible. Precisely, one aims to obtain SDPs with aggregate sparsity (cf. Section 3) or polynomial optimization problems with term sparsity (cf. Section 4). If this can be done, then the sparsity exploiting techniques discussed in Sections 3 and 4 can bring considerable computational gains and enable the study of very large systems.
This section describes how chordal sparsity can be exploited for a small selection of problems in control and machine learning. Section 6.1 focuses on stability analysis for linear and nonlinear systems, and on decentralized control of networked linear systems. In Section 6.2, we review sparsity-promoting relaxations of nonconvex quadratically constrained quadratic programs (QCQPs) and apply them to the well-known Max-Cut problem from graph theory, as well as to a network sensor location problem. Finally, Section 6.3 shows how chordal sparsity allows for efficient verification of neural networks in machine learning. We stress that these are only a few of the application domains in which chordal decomposition has enabled considerable progress in recent years; other fields include, for instance, fluid mechanics, model predictive control, and optimal power flow. Table 2 provides a (non-exhaustive) list of references.

Stability analysis and decentralized control
Stability analysis and control synthesis problems for dynamical systems governed by ordinary differential equations can often be reformulated as SDPs or SOS programs using Lyapunov functions (Boyd et al., 1994;Lasserre, 2010;Papachristodoulou & Prajna, 2005;Parrilo, 2000;Zhou et al., 1996). If the interactions between individual components of the system have a sparse graph structure, considering Lyapunov functions with a separable or nearly-separable structure can lead to sparse SDPs and SOS programs, which can be solved efficiently using the techniques in Sections 3 and 4. Here, we give three simple examples of this fact.
6.1.1. Stability of linear networked systems Consider a continuous-time linear autonomous systeṁ where x(t) ∈ R n is the system state at time t and A ∈ R n×n is the system matrix. It is well known (Boyd et al., 1994;Zhou et al., 1996) that the equilibrium state x(t) = 0 is asymptotically stable if and only if all eigenvalues of A have negative real part. Classical Lyapunov stability theory guarantees that this is true if and only if there exists a positive definite matrix P such that the (positive definite) Lyapunov function V (x) = x T P x decays monotonically along all system trajectories x(t). Equivalently, P must satisfy the strict LMIs P 0, Now, suppose that (6.1) is a compact representation of a network of l linear subsystems with states x 1 ∈ R n1 , . . ., x l ∈ R n l , whose interactions can be represented by a static undirected graph G d ({1, . . . , l}, E d ) with (i, j) ∈ E d if and only if systems i and j are directly coupled. In particular,  the dynamics of each subsystem are given explicitly bẏ . . . , l, (6.3) where N i := {j : (j, i) ∈ E d } denotes the neighbors of system i. Systems of this type are encountered, for example, when modelling power grids (Riverso et al., 2014) and traffic systems (Wang et al., 2020b;. If the matrix P in (6.2) is assumed to be block-diagonal with l blocks of size n 1 , . . . , n l , meaning that we consider a quadratic Lyapunov function in the separable form (Boyd & Yang, 1989;Geromel et al., 1994;Zheng et al., 2018d) then it is not hard to see that the block-sparsity graph of the matrix A T P + P A in (6.2) is the same as the system graph G d . When this graph is chordal with small maximal cliques, or admits a chordal extension with the same property, a feasible block-diagonal matrix P satisfying (6.2) can be constructed for significantly larger networks than that can be handled without sparsity exploitation. Equiv-alently, for a given network size, CPU time requirements can be reduced dramatically.
As an example, consider a network with a master node and l−1 independent subsystems connected to it, sketched in Figure 6.1(a). For simplicity, suppose that the subsystems have size n 1 = · · · = n l = 10. With a block-diagonal P , the second LMI in (6.2) has the chordal "arrow-type" block sparsity shown in Figure 6.1(b). Table 3 reports the CPU time required to construct a feasible P with MOSEK as a function of the number l of subsystems when the sparsity of this LMI is and is not exploited. 3 It is evident that exploiting chordal sparsity using the methods described in Section 3 leads to a significant reduction in CPU time. Similar results are obtained for systems with more realistic network graphs if its maximal cliques are small; see Mason & Papachristodoulou (2014), Deroo et al. (2015) and Zheng et al. (2018c,d).
Remark 6.1 (Separable Lyapunov functions). Searching for a Lyapunov function V (x) with the separable structure (6.4) is convenient to ensure that the sparsity of the  Table 3.
(b) Block sparsity pattern of the matrix P A + A T P when P is block-diagonal; each block has size 10 × 10, and there are l diagonal blocks. Table 3: CPU time (in seconds) required by the SDP solver MOSEK to construct a block-diagonal Lyapunov matrix P satisfying the LMIs in (6.2) for a network of l 10-dimensional systems with connectivity graph G d shown in Figure 6.1(a). system matrix A is inherited by the LMI A T P + P A ≺ 0.
The existence of such a separable Lyapunov function can be guaranteed for special classes of stable linear systems (Carlson et al., 1992;Sootla et al., 2017Sootla et al., , 2019, but not in general. When a separable Lyapunov function V (x) fails to exist, the structure of the network graph G d may be still be leveraged to promote sparsity in (6.2); for instance, the case of banded graphs, cycles and trees was studied by Mason & Papachristodoulou (2014). Determining a suitable structure for V (x) (equivalently, for the matrix P ) a priori for general graph structures, however, remains a challenging problem.

Stability of sparse polynomial systems
Structured Lyapunov functions can bring computational advantages also when studying the asymptotic stability of sparse nonlinear systems with polynomial dynamics. As an example, consider a nonlinear system with the structure (Zheng et al., 2019a, Section VI.D) x 1 = f 1 (x 1 , x 2 ), where each vector field f i depends polynomially on its arguments and x i ∈ R ni . Let x = (x 1 , . . . , x l ) be the collection of all system states and write f = (f 1 , . . . , f l ). Suppose the system has an equilibrium at the origin. This equilibrium is locally asymptotically stable if there exist a region D ⊂ R n1 × · · · R n l containing the origin, a constant > 0, and a Lyapunov function V : R n1 × · · · × R n l → R such that . . , l}, which has a fully separable structure, and requiring V to be a polynomial, the last two inequalities become polynomial inequalities on a basic semialgebraic set. One can therefore search for V using SOS optimization. Moreover, the structure of V can be chosen to ensure that these polynomial inequalities are correlatively sparse (cf. Sections 4.2.2 and 4.2.5), enabling efficient implementation.
For example, if one takes to have a fully separable structure as in the case of linear systems considered previously, then the correlative sparsity graph of inequalities (6.6b) is a graph with no edges, while that of (6.6c) is the same chain graph characterizing the cascaded interactions between the state vectors x 1 , . . . , x l , shown in Figure 6.2(a) for l = 8. If this choice for V is insufficient, one can try the structured choice In this case, the correlative sparsity graph of (6.6b) is the chain graph mentioned above, while that of (6.6c) is a chordal graph with maximal cliques {i, i+1, i+2, i+3} for i = 1, . . . , l − 3, which is shown in Figure 6.2(b) for l = 8. One can of course build an entire hierarchy of structured Lyapunov functions with increasing degree of couplings between subsystem variables, at the expense of increasing the number of edges in the correlative sparsity graph of the polynomial inequalities (6.6b) and (6.6c). Numerical experiments by Zheng et al. (2019a) for the structured Lyapunov function in (6.8), which we report in Table 4, show that this approach can significantly reduce the computation time and resources required to prove stability of nonlinear systems compared to standard SOS techniques. Similar ideas can be used to partition nonlinear systems into subsystems (Anderson & Papachristodoulou, 2011) and can be adapted to problems beyond stability analysis, such as the estimation of region of attractions, positively invariant sets, and global attractors (Schlosser & Korda, 2020;Tacchi et al., 2019a).

Decentralized control of linear networked systems
Consider a network of linear system with control inputs and disturbances, where x i ∈ R ni , u i ∈ R mi and d i ∈ R qi denote the local state, input, and disturbance of subsystem i, respectively, and N i is the index set of all systems connected to system i. Setting x = (x 1 , . . . , x l ), u = (u 1 , . . . , u l ) and d = (d 1 , . . . , d l ), the system can be written compactly aṡ where A has block sparsity induced by the system graph (cf. Section 6.1.1), while B = diag(B 1 , . . . , B l ) and M = diag(M 1 , . . . , M l ) are block-diagonal.
The optimal decentralized control problem (Geromel et al., 1994) seeks to design static state feedback laws, ∀i = 1, . . . , l, (6.9) that minimize the H 2 norm of the transfer function from disturbance d to the output where Q := diag(Q 1 , . . . , Q l ) and R := diag(R 1 , . . . , R l ) are given block-diagonal matrices. The decentralized constraint (6.9) makes the control problem challenging to solve (Furieri et al., 2019;Geromel et al., 1994). One simple strategy is to enforce that the closed-loop system admits a separable Lyapunov function in the form (6.4).
This allows translating the decentralized constraint on the controller to other auxiliary design variables (Furieri et al., 2019(Furieri et al., , 2020. In particular, a suboptimal decentralized controller can be computed using the formula K ii = Z i X −1 i for each i = 1, . . . , l (Geromel et al., 1994;Zheng et al., 2020, Section II.B), where the matrices Z 1 , . . . , Z l and X 1 , . . . , X l solve the SDP min Xi,Yi,Zi . . , l (6.10b) and X = diag(X 1 , . . . , X l ) and Z = diag(Z 1 , . . . , Z l ) are block-diagonal concatenations of the matrix variables. The cost function of this SDP and the constraints in (6.10b) are fully separable, as they depend only on variables corresponding to a single subsystem. The coupling constraint (6.10a), instead, has a block sparsity pattern induced by the system graph by virtue of the block-diagonal structure of B, M , X and Z. As in Section 6.1.1, therefore, the chordal decomposition techniques of Section 3 allow for a fast numerical solution when the underlying system graph is sparse, which enables control synthesis for large-scale but sparse networks. In addition, customized distributed design methods that combine chordal decomposition with ADMM can solve (6.10) in a privacy-safe way, without requiring subsystems to share information about their local dynamics .

Relaxation of nonconvex QCQPs
A (nonconvex) quadratically constrained quadratic program (QCQP) is an optimization problem in the form where x ∈ R n is the optimization variable, and P i ∈ S n , q i ∈ R, r i ∈ R, i = 0, 1, . . . , m are given problem data. QCQPs have very powerful modeling capabilities; for instance, many hard combinatorial and discrete optimization problems can written in the form (6.11) (Nesterov et al., 2000). This also means that QCQPs are hard to solve in general, so many different relaxation strategies have been proposed to find approximate bounds and feasible values for the optimization variable x (Nesterov et al., 2000;Park & Boyd, 2017). One approach that provides good bounds, both empirically and theoretically (Nesterov et al., 2000), is to introduce the positive semidefinite matrix X = xx T and rewrite (6.11) as min x,X P 0 , X + 2q T 0 x + r 0 subject to P i , X + 2q T i x + r i ≤ 0, i = 1, . . . , m, X = xx T .
Upon relaxing the intractable constraint X = xx T into the inequality X xx T and applying Schur's complement to rewrite the latter as an LMI, we arrive at the semidefinite relaxation min x,X which is equivalent to the following primal-form SDP with nonnegative variables It is not difficult to see that the optimal value of problem (6.12) bounds that of the QCQP (6.11) from below and that, if an optimal solution Z has rank one, then the relaxation is exact and Z = 1 x T x x x T where x solves (6.11).
If the data matrices P 0 , . . . , P m are sparse, then the aggregate sparsity pattern E of the SDP (6.12) is also sparse, and the positive semidefinite constraint on Z can be replaced with the conic constraint Z ∈ S n+1 + (E, ?). The chordal decomposition techniques described in Section 3 can therefore be applied to solve (6.12) efficiently. The following subsections briefly discuss two types of problem for which sparsity can be exploited effectively: Max-Cut problems (Goemans & Williamson, 1995) and sensor network location problems (Jing et al., 2019;Kim et al., 2009;Nie, 2009;So & Ye, 2007).

Max-Cut problem
The maximum cut (Max-Cut) problem is a classic problem in graph theory (Goemans & Williamson, 1995). Consider an undirected graph G(V, E) with n vertices such that each edge (i, j) ∈ E is assigned a nonzero weight W ij , and set W ij = 0 if (i, j) / ∈ E. The Max-Cut problem aims to partition the graph's vertices into two complementary sets V 1 and V 2 such that the total weight of all edges linking V 1 and V 2 is maximized. Given a binary variable x ∈ {−1, +1} n assigning nodes to one of the two partitions, one seeks to maximize This is equivalent to solving min x x T W x subject to x 2 i = 1, i = 1, . . . n, where W is the given matrix of weights. This problem is a particular QCQP, and can easily be rewritten in the generic form (6.11) using data matrices P 0 , P 1 , . . . , P n whose aggregate sparsity graph coincides with the original graph G. If G is sparse with small maximal cliques,therefore, SDP relaxations of (6.13) can be solved efficiently using the sparsity-exploiting techniques in Section 3. Indeed, numerical experiments by Andersen et al. (2010a) and  demonstrated that the sparsity-exploiting solvers SMCP and CDCS can solve benchmark Max-Cut problems from the SDPLIB problem library (Borchers, 1999) order of magnitude faster than standard conic solvers.

Sensor network location
The sensor network location problem, also known as Graph Realization (So & Ye, 2007), has important applications such as inventory management and environment monitoring. At a basic level, the problem is to find unknown sensor points x 1 , . . . , x n ∈ R d (d = 2 or 3) satisfying some specified distance constraints, as well as distance constraints with respect to m known anchor points a 1 , . . . , a m ∈ R d . Precisely, given pairing sets we seek to find sensor locations x 1 , . . . , x n ∈ R d such that 14) where the numbers d ij and f ij are specified distances. One way to relax the sensor location problem into an SDP is to consider (6.14) as a set of quadratic constraints for x 1 , . . . , x n , and apply the generic SDP relaxation strategy to the QCQP  min x1,...,xn∈R d 0 subject to (6.14).
(6.15) It is clear that the data matrices and vectors of this QCQP are very sparse, and that the aggregate sparsity pattern of the corresponding SDP relaxation is determined only by the edge sets E a and E x . Then, the techniques in Section 3 can be applied to solve the relaxed problem quickly; we refer the interested reader to Kim et al. (2009) for more detailed discussions and experiment results. Similar ideas can be used to analyze sensor location problems where the distance measurements d ij and f ij are affected by noise ).
Remark 6.2. There are other ways to formulate an SDP relaxation for (6.15). One (So & Ye, 2007) is to introduce a matrix variable Y = XX T with X = x 1 , x 2 , . . . , x n ∈ R d×n , rewrite all the constraints in (6.15) as linear equalities in X and Y , relax the nonconvex relation between these variables into the inequality Y XX T and apply Schur's complement to obtain an SDP. A sparsityexploiting version of this approach is described by Kim et al. (2009, Section 3.3). Another option (Nie, 2009) is to formulate the search for the sensor locations as an unconstrained polynomial optimization problem, min x1,...,xn (i,j)∈Ea The polynomial objective is term-sparse when the coupling set E x contains only a small subset of all pairs (i, j) (in fact, correlatively sparse; see Section 4.2 for definitions of these concepts). Therefore, the sparse SOS techniques outlined in Section 4.2 can be applied to solve the problem efficiently. The interested reader is referred to Nie (2009) for experiment results.

Machine learning: Verification of neural networks
Neural networks are one of the fundamental building blocks of modern machine-learning methods. For safetycritical applications, it is essential to ensure that they are provably robust to input perturbations. Given a neural network f (x 0 ) : R d → R m , a nominal inputx ∈ R d , a linear function φ : R m → R on the network's output, and a perturbation radius ∈ R, the network verification problem (Raghunathan et al., 2018;Salman et al., 2019;Tjandraatmadja et al., 2020) asks to either verify that (6.16) or to identify at least one counterexample to this relation. Consider an L-layer feedforward neural network where where W i ∈ R ni+1×ni and b i ∈ R ni+1 are the network weights and biases, respectively, and the so-called Rectified Linear Unit (ReLU) activation function ReLU : R k → R k is the element-wise positive part of its argument, ReLU(z) = [max(z i , 0)] k i=1 . Condition (6.16) can be decided by solving the optimization problem where [L] := {0, 1, . . . , L − 1} and c, c 0 are problem data related to the linear function φ(·). If γ > 0, then (6.16) holds, otherwise counterexamples can be found.
Since the action of the ReLU function can be described by quadratic constraints,  problem (6.17) can be reformulated into a QCQP with variable Raghunathan et al., 2018), and subsequently relaxed into an SDP as described in Section 6.2 above. If the optimal value of this SDP is positive, the network is verified; otherwise, nothing can be said. Since the constraints (6.17a) have a very natural cascading structure, the interaction among variables x 0 , . . . , x L can be modeled by a line graph with maximal cliques C i = {i, i + 1} for i = 0, . . . , L − 1 (see Figure 6.3 for illustration with L = 4). The SDP relaxation of (6.17) inherits this cascading structure, in addition to any sparsity coming from the structure of the weight matrices W i . The chordal decomposition techniques described in Section 3 can therefore be applied to solve it efficiently. This idea has been recently validated by Batten et al. (2021), who considered robustness verification in the context of image classifiers. For instance, the results reproduced in Figure 6.4 for a neural network with L = 2 layers and n i = 64 neurons per layer show that exploiting sparsity reduced by two orders of magnitude the CPU time required to verify the robustness of an image classifier on the MNIST dataset. Similar results were obtained by Newton & Papachristodoulou (2021), and interested readers are invited to consult Table 2 for references to more machine learning applications where sparsity exploitation can dramatically reduce computational complexity.

Conclusion and outlook
In this paper, we reviewed theory and applications of decomposition methods for large-scale semidefinite and polynomial optimization. Specifically, we presented classical chordal decomposition results for sparse positive semidefinite matrices (cf. Theorems 2.1 to 2.4) and we discussed how they can be exploited to implement efficient first-and second-order algorithms for SDPs (Section 3). We showed also how matrix decomposition (primarily, but not necessarily, chordal) can be leveraged to exploit term sparsity and structural sparsity in large-scale polynomial optimization (Section 4). In particular, we demonstrated that many sparsity-exploiting techniques for polynomial inequalties-including the well-known correlatively sparse SOS representations and the recent TSSOS, CS-TSSOS and chordal-TSSOS hierarchies-are based on the general matrix decomposition strategy outlined in Section 4.2.1. We also discussed how the classical chordal decomposition theorem (Theorem 2.1) can be generalized in different ways to obtain SOS chordal decomposition theorems for sparse polynomial matrices (cf. Theorems 4.4, 4.5 and 4.6 and further results by ). In Section 5, we reviewed factor-width decompositions for SDPs with dense semidefinite constraints, to which chordal decomposition cannot be applied. Finally, in Section 6 we demonstrated how all of these techniques can be used to reduce the computational complexity of SDPs and polynomial optimization problems encountered in some control and machine learning applications. References to these and other applications are summarized in Table 2.
Despite the considerable progress made in recent years, numerical methods for semidefinite and polynomial optimization are still far from being mature. The most pressing open challenge, in our opinion, lies in bridging the gap between the size of SDPs that can currently be solved with tractable computational resources, and the size of the SDPs that arise from complex control applications. Indeed, the state-of-the-art decomposition techniques reviewed in this article are often still not enough to enable the use of semidefinite programming to analyze and control large-scale nonlinear systems. The same is true for control problems with systems of smaller size, but which require real-time computations.
Achieving significant progress is likely to require theoretical extensions of the decomposition approaches we have discussed, as well as the development of efficient software that can effectively exploit modern multi-core and distributed-memory computer architectures. We conclude this article by outlining some possible research directions that may bear fruit in the near future.

Combining matrix decomposition with other structures
SDPs encountered in applications often have structural properties beyond sparsity, which can also be leveraged to reduce computational complexity; examples are symmetries, the existence of low-rank solutions, and low-rank data matrices (De Klerk, 2010;Gatermann & Parrilo, 2004;Majumdar et al., 2020). It is natural to try and combine the exploitation of such additional structure with matrix decomposition, but, to the best of our knowledge, a unified and theoretically robust framework to do so is yet to be developed. Particular questions to be answered in this context include whether there exist symmetry reduction techniques that preserve (or even promote) sparsity in SDPs, and whether low-rank positive semidefinite completions (Dancis, 1992, Theorem 1.5) can be exploited in SDPs with aggregate sparsity and low-rank optimal solutions (see Jiang, 2017 andMiller et al., 2019a for some results in this direction).
In addition, although we have presented chordal and factor-width decompositions separately, they can be combined if either one, applied in isolation, does not reduce the complexity of a large-scale SDP enough. A relatively straightforward approach (Miller et al., 2019b) is to first apply the standard chordal decomposition, and then enforce positive semidefinite constraints associated to large maximal cliques using factor-width approximations. This idea can be taken forward in various directions; for instance, one could use block-chordal and block-factor-width decompositions, or extend ideas by Garstka et al. (2020) to formulate adaptive strategies wherein cliques are either combined or factor-width decomposed, depending on their relative sizes and on the available computational resources. Both ideas remain largely unexplored, and further work is required to determine if they can be brought to bear on real-life control problems.

Tailored hierarchies for sparse polynomial optimization
Almost all existing methods for exploiting term sparsity in polynomial optimization rely on the general matrix decomposition approach presented in Section 4.2.1, where the Gram matrix associated with SOS certificates of nonnegativity is decomposed according to the maximal cliques of a sparsity graph to be prescribed a priori. While the correlatively sparse, TSSOS, and related hierarchies described in Section 4.2 give useful general strategies to select this sparsity graph, there is ample scope for tailoring the graph structure in particular control applications. It is not unreasonable to expect that problem-specific choices, motivated for example by physical intuition on the dynamical system one is trying to analyse or control, may bring significant further gains. However, it remains to be seen whether this expectation can be met in practice. Better integration between the development of optimization tools and application-related modeling, discussed further below, seems key to achieving progress in this direction.

Decomposition and completion of polynomial matrices
The exploitation of sparsity for polynomial matrix inequalities can be improved in various directions, reducing computational complexity beyond what can be achieved using only the SOS chordal decomposition results summarized in Section 4.3. For instance, those results can be combined in a natural way with techniques to leverage term-sparsity in scalar polynomial inequalities. Indeed, when a polynomial matrix inequality P (x) 0 is "scalarized" into a nonnegativity condition for the polynomial p(x, y) = y T P (x)y, the structural sparsity of P translates into correlative sparsity of p with respect to y. The matrix decomposition results of Section 4.3 have equivalent statement at the scalar level , Section 4) that can be used to refine or extend term-sparse SOS decomposition hierarchies for polynomials. The latter, in turn, can be used to efficiently handle (scalarized) polynomial matrix inequalities.
It would also be interesting to establish SOS completion results for sparse polynomial matrices, in the spirit of Theorem 2.2. Preliminary results in this direction exist (Zheng et al., 2018a), but are far from complete. Extension of the results in this reference will contribute to building a comprehensive theory for SOS chordal decomposition and completion of polynomial matrices, which can be used to build tractable SDP approximations of large-scale optimization problems with sparse polynomial matrix inequalities.

To chordality and beyond
Exploiting sparsity in semidefinite and polynomial optimization without modifying the problem usually requires chordality (cf. Theorems 2.1, 2.2, 2.3 and 2.4 for SDPs, and Theorems 4.3, 4.5 and 4.6 for polynomial optimization). Enforcing chordality with traditional chordal extension strategies, even if approximately minimal, may lead to graphs with unacceptably large maximal cliques. The largest maximal clique size plays a major role in determining the computational complexity of a decomposed SDP (or SDP relaxation of a polynomial optimization problem). Therefore, systematic techniques to produce chordal extensions that approximately minimize the largest maximal cliques size would be very valuable.
If good chordal extensions prove hard to find, a compelling alternative is to sacrifice chordality and use nonchordal graphs with small cliques that can be determined analytically. This was done, for instance, by Nie & Demmel (2009) and Kočvara (2020). While clique decompositions of matrix inequalities based on nonchordal graphs are conservative in general, it may still be possible to identify classes of matrices for which the equivalence between the original and decomposed inequalities can be guaranteed. For example, sparse (scaled)-diagonally dominant matrices always admit a clique decomposition, even when their sparsity graph is not chordal (Miller et al., 2019b, Proposition 1). The same is true for certain positive semidefinite matrices whose sparsity pattern can be extended to be of a "block-arrow" type (Kočvara, 2020). Necessary and sufficient cycle conditions for positive semidefinite completion problem with nonchordal sparsity graphs were investigated by Barrett et al. (1996). Extensions of these results, even if limited to particular application domains, are likely to enable considerable progress in the solution of large-scale SDPs with nonchordal sparsity.

Efficient software for modern computers
Reliable and user-friendly implementations of the cuttingedge decomposition techniques for SDPs and polynomial optimization problems reviewed in this paper are, in our opinion, just as important as further theoretical advances. Most of the available open-source packages mentioned in Sections 3.5 and 4.5 have not yet reached the level of maturity required to solve robustly a wide range of SDPs or polynomial optimization problems arising from real-life applications. Moreover, many of the commonly-used optimization modeling environments on which these packages rely are by now over a decade old, and often cannot handle extremely large problems of industrial relevance efficiently.
The lack of very-high-performance software currently limits the scale of problems that can be solved without adhoc implementations. Since such implementations require considerable expertise in large-scale optimization, the deployment of SDP-based frameworks for system analysis and control to real-world problems is currently hindered. We expect that improvements in software reliability, efficiency, user-friendliness, and the ability to leverage modern multi-processor and/or distributed computing platforms will considerably increase the practical impact of decomposition methods for SDPs, bringing great benefit to the community of application-oriented researchers.

Blending application-driven modeling with optimization
The decomposition techniques reviewed in Sections 3, 4 and 5 apply to generic standard-form SDPs and polynomial optimization problems, irrespective of the context in which they arise. In control-related application, however, SDPs and polynomial optimization problems often come from modeling or relaxation frameworks for the study of dynamical systems, the details of which strongly affect the structure of the eventual optimization problem. Bridging the existing gaps between application-driven modeling and the development of large-scale optimization algorithm promises to enable significant progress in the study of linear and nonlinear systems. On the one hand, it may be possible to implement tailored SDP solvers that target special structures arising in particular applications. On the other hand, given a particular control or analysis task, one should attempt to formulate modelling approaches that lead to optimization problems with a "computationally friendly" structure. For example, when studying fluid flows using semidefinite programming (see, e.g., Fantuzzi et al., 2018 andArslan et al., 2021), a smart discretization of the flow field leads to SDPs with chordal aggregate sparsity that can be solved in minutes even though their linear matrix inequalities have more than 10 000 rows/columns. Similarly, using structured Lyapunov (or Lyapunov-like) functions as explained in Section 6.1 can lead to structured SDPs, enabling the analysis of increasingly large systems in fields such as robotics, smart energy grid, and autonomous transportation.
Of course, the design of analysis and control frameworks that combine system-level modeling with algorithmic considerations will present a number of challenges. Resolving these challenges, however, promises to remove longstanding barriers to the study of complex systems, especially nonlinear ones. Success seems likely to require a collaborative effort between researchers working in different areas and an increasing awareness of outstanding problems in particular application domains, as well as of state-ofthe-art tools for large-scale optimization. We hope that the present review of decomposition methods for semidefinite and polynomial optimization takes a step in the right direction and can inspire new discoveries in the near future.
The no fill-in property of the Cholesky factorization for positive definite matrices with chordal sparsity is one of the most important results for sparsity exploitation in matrix calculations; for instance, it enables a simple proof of Theorem 2.1 and efficient computations involving barrier functions for sparse matrix cones (cf. Section 3.4.2). To formally introduce this no fill-in property, we first define the notions of simplicial vertices and perfect elimination ordering for graphs.
Definition Appendix A.1. A vertex v in a graph G(V, E) is called simplicial if all its neighbors are connected to each other.
Definition Appendix A.2. An ordering σ = {v 1 , . . . , v n } of the vertices in a graph G is a perfect elimination ordering if each v i , i = 1, . . . , n, is a simplicial vertex in the subgraph induced by nodes {v i , v i+1 , . . . , v n }.
For example, vertices 2, 4, 6 are simplicial for the graph in Figure A.1(a), and the ordering σ = {2, 4, 6, 1, 3, 5} is a perfect elimination ordering. A graph G is chordal if Algorithm 1 Maximal cardinality search Input: A graph G(V, E) Output: An elimination ordering α of G for all vertices v in G do w(v) = 0. end for for i = n to 1 do pick an unnumbered vertex v with maximum weight in w; set α(v) = i; for all unnumbered vertex u adjacent to v do w(u) ← w(u) + 1; end for end for and only if it has at least one perfect elimination ordering (Vandenberghe et al., 2015, Theorem 4.1). The maximal cardinality search (Algorithm 1) either returns one of the perfect elimination orderings or certifies that none exists in O(|V| + |E|) time (Tarjan & Yannakakis, 1984). Now, given a positive definite matrix Z ∈ S n + (E, 0) with a chordal sparsity pattern E, we have a sparse Cholesky factorization with zero fill-in (Rose, 1970), (Vandenberghe et al., 2015, Theorem 9.1) P σ ZP T σ = LL T , P T σ (L + L T )P σ ∈ S n (E, 0), (A.1) where P σ is a permutation matrix corresponding to the perfect elimination ordering σ and L is a lower-triangular matrix. This can be proven using an elimination process according to the perfect elimination ordering σ; see ( Vandenberghe et al., 2015, Chapter 9.1) and Kakimura (2010) for details. Figure A.2 illustrates the process of sparse Cholesky factorization for a 6 × 6 positive definite matrix with chordal sparsity graph shown in Figure A.1(a). The sparse Cholesky factorization (A.1) with zero fill-in allows for a simple proof of Theorem 2.1. For simplicity, but without loss of generality, assume that the matrix Z has already been permuted in such a way that σ = {1, 2, . . . , n} is a perfect elimination ordering, so P σ = I in (A.1). We denote the columns of L by l 1 , l 2 , . . . , l n , and write Since L+L T has the same sparsity pattern E, the non-zero elements of each column vector l i must be indexed by a maximal clique C hi for some h i ∈ {1, . . . , t}. Thus, the non-zero elements of l i can be extracted through multiplication by the matrix E Ci , and we have Now, let J k = {i : h i = k} be the set of column indices i such that column i is indexed by clique C k . These index sets are disjoint and ∪ k J k = {1, . . . , n}, so we obtain This is exactly (2.3) in Theorem 2.1 with matrices Z k = i∈J k Q i that is in S |C k | + .

Appendix C. Some properties of maximal cliques
A connected chordal graph G(V, E) with n vertices has at most n − 1 maximal cliques that can be identified in linear time-more precisely, with a complexity of O(|V| + |E|)

Algorithm 2 Maximal clique search
Input: A chordal graph G(V, E), and a perfect elimination ordering α = {v 1 , . . . , v n } Output: All its maximal cliques C 1 , C 2 , . . . , C t Initialize C 0 = ∅; for i = 1 to n do C i = {v i } ∪ {u adjacent to v i and behind v i in α}; if C i is not a subset of C 0 then C i is a maximal clique; C 0 = C i ; end if end for (Berry et al., 2004;Tarjan & Yannakakis, 1984). Algorithm 2 is a simple strategy with a complexity O(|V| + |E|) to find all maximal cliques based on a perfect elimination ordering. For example, the chordal graph in Figure  The sets C 1 , . . . , C 4 are maximal cliques, while C 5 , C 6 are not because they are subsets of C 4 . The maximal cliques of a chordal graph can be arranged in a so-called clique tree, that is, a graph T (Γ, Ξ) with the maximal cliques Γ = {C 1 , . . . , C t } as its vertices and an edge set Ξ ⊆ Γ × Γ. In particular, the clique tree can be chosen to satisfy the clique intersection property, meaning that C i ∩ C j ⊆ C k if clique C k lies on the path between cliques C i and C j in the tree and the intersection C i ∩ C j is nonempty (Blair & Peyton, 1993). For example, the clique tree in Figure A.1(c) satisfies the clique intersection property.
The maximal cliques of a chordal graph play a central role in the sparse matrix decomposition results stated in Theorems 2.1, 2.2, 2.3 and 2.4. It is important to remember that these require one to use all maximal cliques in the (chordal) sparsity graph of a matrix X, even when a subset of cliques already covers all nonzero entries of X. For example, consider the indefinite matrix