An Efficient, Memory-Saving Approach for the Loewner Framework

The Loewner framework is one of the most successful data-driven model order reduction techniques. If N is the cardinality of a given data set, the so-called Loewner and shifted Loewner matrices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {L}}\in {\mathbb {C}}^{N\times N}$$\end{document}L∈CN×N and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {S}}\in {\mathbb {C}}^{N\times N}$$\end{document}S∈CN×N can be defined by solely relying on information encoded in the considered data set and they play a crucial role in the computation of the sought rational model approximation.In particular, the singular value decomposition of a linear combination of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {S}}$$\end{document}S and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {L}}$$\end{document}L provides the tools needed to construct accurate models which fulfill important approximation properties with respect to the original data set. However, for highly-sampled data sets, the dense nature of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {L}}$$\end{document}L and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {S}}$$\end{document}S leads to numerical difficulties, namely the failure to allocate these matrices in certain memory-limited environments or excessive computational costs. Even though they do not possess any sparsity pattern, the Loewner and shifted Loewner matrices are extremely structured and, in this paper, we show how to fully exploit their Cauchy-like structure to reduce the cost of computing accurate rational models while avoiding the explicit allocation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {L}}$$\end{document}L and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {S}}$$\end{document}S. In particular, the use of the hierarchically semiseparable format allows us to remarkably lower both the computational cost and the memory requirements of the Loewner framework obtaining a novel scheme whose costs scale with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N \log N$$\end{document}NlogN.


Introduction
The Loewner framework, originally proposed in [30] for solving the generalized realization problem coupled with tangential interpolation, was successfully employed for data-driven model order reduction from frequency domain data [26]. Measurements of the frequency response are available in several communities: electrical engineering (impedance, admittance or scattering parameters [26]), mechanical and civil engineering (structural and vibro-acoustic frequency response functions [35] or frequency response measurements of thermal systems [10]), to name a few. The first step in the Loewner framework consists in setting up the data matrices and building the Loewner and shifted Loewner matrices entry-wise based on the chosen partition into right and left data, followed by computing the singular value decomposition (SVD) of a linear combination of these matrices and forming the model by projection, using the dominant singular triplets. The main advantages of the Loewner framework over existing approaches are, on the one hand, its system identification capabilities, in the sense that the order of the system can be deduced from the singular value drop, and, on the other hand, its potential in dealing with systems with a large number of inputs and outputs efficiently thanks to the incorporation of tangential interpolation. The main drawbacks, however, are the large storage requirements paired with the significant CPU cost inherent to the full SVD computation for data sets with a large number of measurements (values in the range 10 5 are common in industrial applications). To bypass these inconveniences, greedy-type approaches were proposed in [26], thus reducing memory requirements, from O(N 2 ) for storing the dense Loewner and shifted Loewner matrices to O(N + n 2 ), and the computational cost, from O(N 3 ) for computing the SVD to O(N n 3 ) and O(N n 4 ), where N is the size of the data set and n is the order of the model.
Taking advantage of numerical linear algebra tools to reduce storage and computational requirements for the Loewner framework is another avenue worth exploring due to the inherent structure embedded in the albeit dense Loewner and shifted Loewner matrices. The factored ADI-Galerkin method for computing these matrices as solutions to certain Sylvester equations with a factored right-hand side was investigated in [18]. Such a scheme computes low-rank approximations to the dense Loewner matrix to speed-up the SVD computation. However, in [18] no results about the accuracy of the computed reduced models are reported. Moreover, the memory constraints coming from the allocation of L and S are still present. Alternatively, one can focus on accelerating solely the step of the SVD calculation by employing Krylov methods (see, e.g., [3,19,25,41] to name a few), by using the randomized SVD [31] to compute the dominant singular triplets instead of the full SVD or other types of inexact SVD-type decompositions (adaptive cross approximation [4], particularly suited for hierarchical matrices, or a CUR decomposition [11] as in [22,38]).
The novel approach proposed in this paper tackles the issue of the memory requirements, at the same time as reducing the CPU cost of the Loewner framework while maintaining the accuracy of the standard approach for large values of the number of measurements. As the Loewner and shifted Loewner matrices satisfy Sylvester equations with diagonal coefficient matrices, they are, in fact, Cauchy-like matrices, obtained as the Hadamard product between a Cauchy matrix C and low-rank right-hand sides. Extensive research has been devoted to fully exploiting the rich structure of Cauchy matrices. Several algorithms for computing the matrix-vector product Cx can be found in the literature and many avoid assembling the full matrix C (see, e.g., [7,15,17,33]). Hierarchically semiseparable matrices (HSS) have been deemed efficient for approximating Cauchy matrices with a low off-diagonal rank [33,34]. HSS and other rank-structured matrices are widely used in developing fast algorithms for algebraic operations (matrix-vector multiplications, matrix factorizations, matrix inversion, etc., see, e.g., [8,33,34,44,47] and references therein) used as building blocks for the solution of certain problems like linear systems of equations [48], eigenvalue problems [45], linear and quadratic matrix equations [23,27], and many more. For our application, the approximation of the Cauchy matrix in HSS format considerably decreases the computational cost of matrix-vector products involving a linear combination of the Loewner and shifted Loewner matrices needed for the partial SVD computation, while avoiding to form them. All results involving HSSmatrices presented in this paper have been obtained by means of the hm-toolbox [28].
The employment of an HSS-representation of C may introduce some inexactness in our scheme and this has to be taken into account in the iterative SVD computation. The use of inexact matrix-vector products within iterative procedures has been the subject of numerous research papers: Krylov techniques for solving linear systems and matrix equations [6,24,32,40,43], eigenvalue problems [13,39], or an inexact variant of the Lanczos bidiagonalization for the computation of leading singular triplets of a generic matrix function [14]. In our case, we do not need an accurate approximation of the singular triplets, but rather have meaningful spaces spanned by the computed left and right singular vectors so that the obtained reduced model inherits the desired approximation properties (see, e.g., [2,21]).
The remainder of the paper is structured as follows. Section 2 provides a review of the Loewner framework, whereas Sect. 3 presents results showcasing the special structure of the Loewner and shifted Loewner matrices as Cauchy-like matrices and their approximation as hierarchically semiseparable matrices allowing for efficient, inexact matrix-vector products in the partial SVD computation. Section 4 presents the results of our numerical experiments and Sect. 5 concludes the paper.

Review of the Loewner Framework
The Loewner framework has been proposed to address the rational interpolation/ approximation problem. In the control community, this is referred to as system identification from frequency domain measurements and is stated below. Problem Statement (Rational approximation in the complex plane) Given points s j in the complex plane (which can represent angular frequencies iω j if s j are on the imaginary axis) and the corresponding transfer function measurement H j ∈ C p×q for a system with q inputs and p outputs: with p and q assumed to be much smaller than N , the problem amounts to finding the rational transfer function H(s) which approximates the data: Thus, the transfer function evaluated for the Laplace variable s = s j should be close (in some norm) to the corresponding measurement H j . Several equivalent representations are possible for the rational transfer function, namely pole-residue, pole-zero, state-space or descriptor-form.
Most systems of interest are real, with their transfer function satisfying the complex conjugate condition H(s) = H(s). Hence, we add complex conjugate measurements (s j ; H j ) to the set (1).
We proceed by presenting the Loewner framework as a solution scheme addressing the rational approximation problem. The first step in the Loewner framework [26,30] is partitioning the data in two disjoint sets. This partition influences the conditioning of the problem [21][Ch. 2.1] and finding the optimal partition for each data set is beyond the scope of this paper. The most natural partitions are summarized in the following (assuming an even number of measurements N and sampling points sorted in ascending order with respect to their absolute value): -Half&Half: the first half of the data in one set and the other half in the second set: and, correspondingly, -Odd&Even: data with odd indices in the first set and data with even indices in the second set: and, correspondingly, The first set on the right in (3) and (5) comprises the right points, denoted by λ k , k = 1, . . . , N , while the second set comprises the left points μ h , h = 1, . . . , N . This splitting into right and left points is related to the concept of tangential interpolation, which is explained in the following paragraph. The following step in the Loewner framework is choosing tangential directions as vectors which transform matrix data H j into vector data: right tangential directions are column vectors r k ∈ C q such that H k r k = w k , whereas left tangential directions are row vectors h ∈ C 1× p such that h H h = v h . The column vectors w k ∈ C p are referred to as right vector data, while the row vectors v h ∈ C 1×q are referred to as left vector data. For simplicity, tangential directions can be chosen as alternating columns/rows of the identity matrix [26], resulting in vector data being column and row vectors of the original matrix data H j in (1).

Remark 1
For scalar data obtained from single-input single-output (SISO) systems ( p = q = 1), tangential directions r k , h are simply equal to 1.

Remark 2
If the loss of information due to utilizing a single tangential direction per measurement, instead of the whole matrix H j , does not allow one to obtain an accurate approximation, one can employ the original matrix H j . This is equivalent to considering several tangential directions for the same point. To obtain block right matrix data for H j ∈ C p×q , the corresponding frequency should be repeated q times as a right point and all columns of the identity matrix of size q × q should be considered as right directions. Similarly, to obtain block left matrix data for H j ∈ C p×q , the corresponding frequency should be repeated p times as a left point and all rows of the identity matrix of size p × p should be considered as left directions.
With this notation in place, the Loewner matrix is defined entry-wise as and the shifted Loewner matrix is defined as Note that the numerators are scalar quantities as they are obtained by taking inner products. The quantities defined previously are collected into the following matrices By construction, the Loewner and shifted Loewner matrices satisfy the following Sylvester equations: as well as the following relations: which will prove useful in our proposed matrix-free matrix-vector product approach. After introducing notation, we are ready to state the solution provided by the Loewner framework to the rational approximation problem. A (non minimal) model for the transfer function in descriptor-form H(s) = C (sE − A) −1 B + D is given by Since we have recast the original problem as a tangential interpolation problem, this transfer function satisfies the right and left interpolation conditions [30] To obtain a minimal model, we perform a singular value decomposition where is diagonal and Y, X contain the left and right singular vectors, respectively. Choosing the order n of the truncated SVD (n is application-dependent), we define (in Matlab notation) X n = X(:, 1:n) and Y n =Y(:, 1:n) * . Finally, the model of size n in descriptor form is In the following section, we exploit the Cauchy-like structure of the Loewner and shifted Loewner matrices to design efficient approaches, both in terms of memory storage and CPU time, to compute the SVD in (14) by making use of hierarchical matrices.

Exploiting the Structure of L and S
For data sets with a sizable number N of measurements H j , the construction of the large, dense Loewner and shifted Loewner matrices is demanding, both in terms of computational efforts as well as storage requirements. The computation of each entry of L and S using (7) and (8) yields a total cost of O(N 2 ( p + q)) floating point operations (FLOPs) for assembling the entire L and S matrices. The number of nonzero entries in L and S is O(N 2 ), much larger than the memory requirements for storing the data in , M, R, W, L, and V 1 . Besides these excessive storage requirements, there are also considerations to be made regarding the CPU time required for the SVD computation of the matrix S − xL, x ∈ { f i } in (14). Especially for large dimensional problems, for which we expect a fast decay, it is preferred to compute only the first n singular triplets, thus avoiding wasting resources in computing the full SVD. To this end, many iterative methods have been developed for computing partial SVDs; see, e.g., [3,19,25,41] to name a few. The bottleneck in these approaches is the matrix-vector product with the coefficient matrix, namely S − xL in our case. This operation costs O(N 2 ) FLOPs due to the dense pattern of S − xL.
This section tackles the cost reduction of performing a matrix-vector product with S − xL while avoiding the explicit allocation of L and S. The proposed strategy is supported by a thorough analysis of the computational cost, showing that, for very large data sets for which carrying out the full SVD is intractable, our strategy leads to remarkable reductions in both the computational efforts and the storage demand for building minimal realizations in the Loewner framework.

Hadamard Product and Cauchy Matrices
We present novel results which exploit the particular structure of the Loewner and shifted Loewner matrices. These developments involve the Sylvester equations (11) with diagonal coefficient matrices and M.

Theorem 1
The Loewner and shifted Loewner matrices L and S satisfying the Sylvester equations in (11) are such that and where C denotes the following Cauchy matrix while the vectors v j ∈ C N and j ∈ C N denote the j-th columns of V and L, respectively, so that Similarly, the vectors r j ∈ C 1×N and w j ∈ C 1×N are the j-th rows of R and W, respectively, namely Proof The Loewner and shifted Loewner matrices L and S are Cauchy-like matrices as they are obtained by taking the Hadamard product • between the Cauchy matrix C and the right-hand sides of the Sylvester equations in (11). In particular, An important property of the Hadamard product reads as follows. For any vectors x, y ∈ C N , it holds This, along with the low-rank structure of VR − LW and MVR − LW , yields the results in (16) and (17).

Corollary 1 Given a vector y ∈ C N and x ∈ {s j }, we have
where y = ( − xI)y, with I, the identity matrix.
Proof Thanks to (12), we can write The result follows by substituting the expression of L given in Theorem 1 in the equation above.
Corollary 1 shows that the majority of the computational cost of performing the matrixvector multiplication (S − xL)y amounts to computing p + q matrix-vector products with the Cauchy matrix C.
Extensive research has been devoted to fully exploiting the rich structure of Cauchy matrices. Several algorithms for computing the matrix-vector product Cy can be found in the literature and many avoid assembling the full matrix C (see, e.g., [7,15,17,33]). In the next section we recall the strategy presented by Pan in [33,34] to represent C in terms of a hierarchically semiseparable (HSS) matrix. Even though the novel scheme proposed in this paper does not depend on the strategy employed for performing the matrix-vector product Cy-as long as it is efficient-we believe that the HSS framework may be advantageous as, in principle, many matrix-vector products with C are needed for computing a (partial) SVD of the matrix S − xL.
We conclude this section with the following remarks.

Remark 3
The number n of singular triplets needed to be computed to achieve the minimal realization (E, A, B, C, D) in (15) is difficult to estimate a-priori 2 . However, the expression of L and S in terms of the Hadamard product can be useful to this end. Indeed, another important property of the Hadamard product is that, for any matrices A and B, rank and similarly for S. Thus, we have In general, the Cauchy matrix C is full rank so this inequality is trivially satisfied. However, depending on the partitioning of the points into λ k and μ h (as in (3) and (5)), it can be numerically low-rank (see, e.g., [33][Theorem 5], [5,9]). If π C denotes the numerical rank of C, then 2( p + q)π C is a rough estimate for the numerical rank of S − xL . Oftentimes, the underlying dynamical system is of much lower complexity, thus allowing for the computation of a minimal realization of reduced order n. One can also use insight of the system itself or count the number of peaks in the frequency response to estimate n (for systems with poles having dominant imaginary parts).

Remark 4
The expression of L and S in terms of the Hadamard product provides us with an upper bound of the spectral norm of the Loewner and shifted Loewner matrix. Indeed, the spectral norm is submultiplicative with respect to the Hadamard product [20][Theorem 5.5.1], hence where C F denotes the Frobenius norm of C. Note that VR − LW can be computed cheaply, e.g., by a power method exploiting the low rank of VR − LW. Similarly, Remark 5 Low-rank approximations to L and S may be computed by adaptive cross approximation [4], particularly suited for hierarchical matrices, the CUR decomposition [11] as in [22,38], or related schemes. These approaches select a certain number of columns and rows of the original matrices in a greedy fashion based on various heuristics, and a core matrix is utilised to compute a low-rank approximation. If a given threshold on the desired accuracy of the computed approximation is provided as an input, these algorithms often construct matrices whose rank is much larger than the one of the target matrices L and S. On the other hand, by fixing the rank k of the approximation, k ≈ rank(L), rank(S) -assuming we know an estimate of rank(L), rank(S) -the accuracy we achieve may be very low affecting the reliability of the computed reduced models.

Hierarchically Semiseparable (HSS) Representation of a Cauchy Matrix
The literature on HSS matrices is rather vast and technical (see, e.g., [8,33,34,44,47] and references therein). Here we recall only the main properties of this class of matrices and their role in the efficient representation of Cauchy matrices. Such a technique is also closely related to the Fast Multipole Method (FMM). We refer the interested reader to, e.g., [8,9] for more details on the interconnection between HSS matrices and FMM.

Definition 1 [34, Definition 27] Let
A be an N × N matrix with α being the maximum rank of all its subdiagonal blocks, namely the blocks of all sizes lying strictly below the block diagonal, and β the maximum rank of all its superdiagonal blocks, namely the blocks of all sizes lying strictly above the block diagonal, respectively. Then, The (α, β)-HSS representation of a matrix A is very advantageous whenever α and β are small. For instance, it allows us to express A in terms of O((α + β)N ) parameters avoiding storing its N 2 entries. Moreover, a whole, efficient HSS arithmetic has been developed in the last decades (see, e.g., [8,47]). For instance, the computational cost of the matrix-vector product Ay amounts to O((α + β)N ) FLOPs. If A is nonsingular, its inverse is also a (α, β)-HSS matrix that can be computed in O((α + β) 3 N ) FLOPs (see, e.g., [34,Section6]).
To fully exploit the HSS framework for our purposes, we wish to represent the Cauchy matrix C in terms of a HSS matrix with a low off-diagonal rank. In light of Corollary 1, this would considerably decrease the computational cost of the matrix-vector products involving S − xL while avoiding forming the dense matrices S and L.
The construction of an HSS approximation C to C is rather involved and the magnitude of the (α, β)-rank of the computed C strictly depends on the partitioning of the frequencies along with the accuracy that has been selected for the actual computation of C 3 . Given two parameters c ∈ C and 1 ≤ r ≤ N , the cardinal relation underlying this construction is the following , and the approximation obtained by neglecting the error term E can be respresented in lowrank format thanks to the separability (in μ i and λ j ) of the first term in the last step of the relation above. See, e.g., [34,Section8] for further details on the computation of an HSS-representation of a Cauchy matrix. In this paper we employ the readily available hm-toolbox [28].

Example 1
We investigate the impact of the most commonly-used frequency partitions (Half&Half and Odd&Even on the HSS-rank of the computed C for a mechanical structure. We emphasize that the most effective partition is problem-dependent and is still an open problem, beyond the scope of this paper. However, in [12], [21, Ch. 2.1] the authors suggest the use of partitions with interleaving frequencies like Odd&Even in order to avoid the introduction of an "artifical" ill-conditioning.
We consider the Flexible Aircraft data set [36] from the MORwiki [42]. This dataset contains 421 frequency values ω j expressed in rad/s and the corresponding measurements of the transfer function H j . We disregard the last data point and consider the remaining frequencies ranging from f 1 = 0.1Hz to f 420 = 42Hz. As this is a mechanical structure, frequencies considered are in the low spectrum, as opposed to electrical systems, for which frequencies typically span the GHz range.
To avoid complex arithmetic, it is preferred and more advantageous to perform a change of basis when dealing with sampling points on the imaginary axis: s j = iω j . By defining we obtain matrices with real entries: where P * stands for the complex conjugate transpose of the matrix P and P −1 = P * . These quantities satisfy analogous expressions as in (11) and (12). Unfortunately, r and M r are no longer diagonal and this represents a major drawback in taking advantage of the Sylvester equations (11) for a fast computation of L r and S r . However, 2 r and M 2 r are diagonal and given by 2 By multiplying the first equation in (11) by M r on the left and, afterwards, multiplying it by r on the right and adding the results together, a new Sylvester equation with diagonal coefficient matrices is obtained: By performing the same operations on the second equation in (11), a similar Sylvester equation is obtained for the shifted Loewner matrix: In the following, we refer to this as the Odd&Even (real) partition 4 We recall the three different partitions of the sampling points {s j = iω j } j=420 j=1 : For each partition, we compute the corresponding Cauchy matrix in HSS format C without assembling the full C beforehand, by means of the function hss of the hm-toolbox: where dM and dL are N dimensional vectors containing the frequencies μ h and λ k , respectively. We then calculate its rank by hssrank( C) 5 . In Table 1 we report the HSS-rank of the matrix C for the partitions mentioned above. Thanks to the small dimension of the dataset, we are able to compute the full Cauchy matrix C and document its (standard) rank along with the relative error C − C / C . As expected, having two disjoint sets of frequencies like in the Half&Half partition leads to a Cauchy matrix C whose (standard) rank is low. This does not happen in the other two scenarios we examine so that taking advantage of the HSS format is necessary to achieve memory-saving representations of C. The results in Table 1 show that a good accuracy in terms of the relative error can be achieved for all three frequency partitions. Nevertheless, the HSS rank of C is significantly lower for the Odd&Even (Real) partition, most likely due to the squaring of the frequencies performed in Odd&Even (Real), which leads to a fast decay in the magnitude of the off-diagonal entries of C. Hence, for a fixed threshold, the off-diagonal blocks of the Cauchy matrix associated to the Odd&Even (Real) partition can be approximated by matrices having a smaller rank than those associated to the other two scenarios we examined. Figure 1 we display the absolute value-on a logarithmic scale-of the entries of the Cauchy matrix C stemming from the different partitions. The same scale has been used in all the three figures, enforcing the observation that the Odd & Even (Real) partition exhibits the fastest decay in the magnitude of the off-diagonal entries of C .

Efficient, Inexact Matrix-Vector Products
Whenever the matrix C admits an accurate approximation in terms of a low-rank HSS matrix C, the computational cost of performing the matrix-vector product (S−xL)y can be significantly reduced.
Proposition 1 Let C be an (α, β)-HSS matrix that approximates the Cauchy matrix C accurately. If L and S satisfy the Sylvester equations in (11), then where Proof From the result in Corollary 1, we can write This proves the first part of Proposition 1. To conclude, by making use of the property that the matrix-vector product with a (α, β)-HSS matrix costs O((α + β)N ) FLOPs and that VR has rank q, a direct computation shows that the number of operations needed to perform (25) amounts to O(( p +q)(α +β +1)N ) FLOPs, which proves the second claim in Proposition 1.
As before, analogous results can be obtained for L r and S r satisfying (22) and (23), respectively.
Proposition 1 shows that, whenever C − C is small, the matrix-vector product (S − xL)y can be well-approximated by the expression in (25) while dramatically reducing the computational complexity from O(N 2 ) FLOPs to O(( p +q)(α +β +1)N ) FLOPs. However, when this approximation is used within our favorite iterative procedure for computing a partial SVD of S − xL, the inexactness introduced by neglecting the term E y should be taken into account.
The use of inexact matrix-vector products within certain iterative procedures has been the subject of numerous research papers: Krylov techniques for solving linear systems and matrix equations [6,24,32,40,43], eigenvalue problems [13,39], or an inexact variant of the Lanczos bidiagonalization for the computation of some leading singular triplets of a generic matrix function f (A) can be found in [14]. With the goal to decrease the computational cost of the overall procedure, these studies show that the accuracy of the matrix-vector product can be relaxed (becoming more and more inaccurate) as iterations proceed. In our framework, the inexactness introduced by approximating (S − xL)y with (25) is fixed throughout the entire iterative procedure and mainly depends on C − C , which is often small, as shown in Example 1. Therefore, the approximation does not greatly affect the accuracy of the computed singular triplets (see Sect. 4). Moreover, in our case, we do not need an accurate approximation of the singular triplets of S − xL.
The main goal is to have meaningful spaces spanned by the computed left and right singular vectors so that the obtained reduced model (E, A, B, C) inherits the desired approximation properties. Moreover, as shown in [21, Corollary 1.4], [2,Proposition 8.25], in the case of noise-free measurements of a low-order rational function, even general projectors, not necessarily obtained from the SVD, can be employed for identifying the underlying function.

Remark 6
In Remark 3 we suggested to use the value 2( p + q)π C , where π C is the numerical rank of C, to decide on the number n of singular triplets of S − xL needed for the reduced model. For interlaced partitions, as it is the case with Odd&Even and Odd&Even (real) (see Table 1), the numerical (standard) rank of the Cauchy matrix is large, in general. Hence, the value n = 2( p + q) max{α, β} may instead be employed for the computation of a meaningful reduced model whenever C can be well-approximated by a (α, β)-HSS matrix C 6 . Moreover, the HSS-rank of C is obtained as a byproduct of the construction of C.

Remark 7
If C admits an accurate approximation in terms of an (α, β)-HSS matrix C, the expression in Theorem 1 shows that L can also be well-approximated by a HSS matrix L whose rank is at most (( p + q)α, ( p + q)β). Even though the computational cost of Ly would still be O(( p + q)(α + β)N ) FLOPs, using the HSS approximation L of L may be very advantageous whenever linear systems with L need to be solved (see, e.g., the procedure presented in [12] for the pseudospectra computation of S − xL). Indeed, as mentioned in Sect. 3.2, the computation of the inverse Once L −1 is computed, we need only O(( p + q)(α + β)N ) FLOPs to perform L −1 y.

Remark 8
We would like to mention that we did not observe any numerical issue related to the matrix-vector product (25) and its stability during our vast numerical testing. Moreover, one may want to perform the matrix-vector product by C in a parallel environment to achieve better computational performance. The strumpack package 7 may be employed to this end. See also, e.g., [37]. However, such a parallel approach has not been used in the numerical experiments presented in Sect. 4.

Numerical Results
In this section we present numerical experiments illustrating the potential of the proposed approach.
In Example 2, we compare our approach to standard procedures employed in the Loewner framework. Recall that the main steps in the standard approach involve forming the full Loewner and shifted Loewner matrices L and S and computing the SVD of S − xL. This SVD can be either computed in full, followed by keeping only the n dominant singular vectors, or only these n singular vectors can be obtained by means of an iterative procedure, where the matrix-vector product with S − xL is needed 8 . In the following, we report the overall running time, considering the construction step (Construction), i.e., the computation of L and S in the standard approach and of C in our approach, as well as the reduction step (Reduction), involving the SVD computation followed by projection to obtain the reduced matrices in (15). In terms of memory requirements, for our approach, this involves the allocation of C in the HSS format, while for the standard approach, we report the storage required for L and S.
In Table 2 we recall the computational cost of the construction and reduction steps of both the standard approach, based on either a full or a partial SVD, and the novel one presented in this paper along with their memory requirements.
Lastly, the accuracy of the reduced models is reported in terms of the normalized H 2 -error: 6 As before, the value 4( p + q) max{α, β} should be preferred whenever L r and S r solve (22) and (23), respectively. 7 https://portal.nersc.gov/project/sparse/strumpack/ 8 We employ the Matlabfunctions svd and svds, respectively. Table 2 Computational cost of the construction (Construction) and reduction (Reduction) steps of the different approaches we test along with their storage demand (Storage). The computational cost of the construction of C can be found, e.g., in [28, Table 1]

Construction
Reduction Storage where · F denotes the Frobenius norm. Similar results in terms of accuracy are attained for the H ∞ -error, however, we decided not to document them here, for the sake of brevity.
In Example 3, we compare our novel strategy to the one presented in [18], which makes use of the low-rank ADI-Galerkin method for computing the Loewner matrix as the solution to (11). Such a scheme computes low-rank approximations to the dense Loewner matrix to speed-up the SVD computation, however, the memory constraints originating from the allocation of L and S are still present.
Results were obtained by running MatlabR2020b [29] on a MacBook Pro with an Intel Core i9 processor running at 2.3GHz using 16GB of RAM. All computations involving HSS matrices employed the hm-toolbox [28] with the default settings and the threshold for off-diagonal truncation set to 10 −14 .

Example 2
We consider a synthetic problem for which we can control the order of the original system (n), the number of inputs and outputs ( p = q), as well as the number of measurements (N ). The system dynamics is generated randomly, with poles in complex conjugate pairs. In particular: -the real part of the poles is random with mean −10 4 and standard deviation −2 · 10 3 ; the imaginary part is also random, with mean 10 4 and standard deviation 10 6 . -residues associated to each pole are rank-1 matrices, obtained as outer products between two random vectors, both having the real part with mean 0 and standard deviation 10, while the imaginary part has mean 0 and standard deviation 10 2 .
Measurement points {s j = iω j } j=N j=1 are logarithmically distributed between 10 4 and 10 7 rad/sec. Last, but not least, random noise with a signal-to-noise ratio S N R = 100 was added to the transfer function evaluation H(iω j ) to obtain the measurement matrices H j . We adopt the Odd&Even (real) partition of the frequencies as it achieves satisfactory approximation results while eliminating complex arithmetic. Tangential directions are chosen as unit vectors (rows and columns of the identity matrix of size p).
We compare the proposed approach to the traditional Loewner framework, in which the Loewner and shifted Loewner matrices L and S are formed and the full SVD of S − xL is computed, as well as the alternative approach in which, after building L and S, a partial SVD of S − xL using the Matlab svds function is computed for various instances of the data set described above for different values of N , p, and n. The command svds was employed with the left starting vector v 1 (same notation as in Theorem 1) instead of a random starting vector, which is the default setting. Figure 2 presents the memory requirements for storing the Loewner and shifted Loewner matrices L and S (in red), as opposed to storing the HSS approximation C in our approach (in blue), along with the storage needed to allocate the data in r , M r , R r , L r , V r , W r , for increasing values in the number of inputs and outputs p (in black). We point out that for values Fig. 2 Example 2. Memory requirements in Megabytes to store L, S, C, and the data matrices ( r , M r , V r , W r , L r , R r ) for different values of N and p of N larger than 40 000, we were not able to allocate the full matrices L and S on the employed laptop (this value, however, depends on the available RAM memory of the machine). For instances when these matrices can be allocated, Fig. 2 shows that the memory requirements for the proposed approach are always much lower than for the standard scheme. Moreover, in contrast to what happens to the memory required for the data matrices, the storage demanded by the allocation of C in HSS format is independent of p.
We report the results of the comparison between the different approaches in terms of run time in Table 3 for the number of measurements N varying between 1 000 to 100 000, the number of inputs and outputs p taking values 1, 5 and 10, and the number of poles being 50 or 100. The "-" is used to indicate the instances for which we were not able to compute the reduced model (15): for N > 50 000, we cannot allocate the full matrices L and S, and for N = 30 000, 40 000 we could not compute the full SVD of S − xL . Such constraints are not relevant to our proposed strategy. It is pertinent to remark the following: 1. the CPU time of the full SVD approach does not depend on p and n, only on N , as expected from Table 2: indeed, the cost of building L and S is quadratic in N whereas the full SVD demands O(N 3 ) FLOPs; the full SVD approach is rarely the fastest method (it can happen for very modest values of N in the considered range); 2. the CPU time of the full assembly of S − xL followed by the svds Matlab command does not depend on p, only on N and n, as expected from Table 2: the construction of L and S costs O(N 2 ) FLOPs, whereas the computational effort for the partial SVD depends on n, leading to a more demanding procedure for large n; it is usually the fastest approach for (very) modest values of N in the considered range and p > 1; 3. the HSS rank of the Cauchy matrix approximation C only depends on the frequency samples, hence on N because, in our scenario, the sampling interval is the same, but the distribution of points inside the interval is different for each N ; there may be instances when, for the same samples, the HSS rank of C may produce slightly different results due to the randomness induced by the adaptive cross approximation procedure used in constructing C (for instance, for N = 50 000, p = 5, n = 50 and n = 100, the rank is 28, while for the rest of the values considered for n and p, the rank is 27); moreover, the HSS rank increases with N ; 4. our proposed approach is as accurate as the first two approaches, highlighting the fact that the HSS approximation C does not lead to significant losses in the approximation   properties of the reduced model (15); clearly, our approach cannot be more accurate than the traditional Loewner framework, especially when the full SVD is performed; 5. last, but not least, the CPU time of the proposed solution depends linearly on p, n and N log N (Table 2), thus being the fastest method for large values of N ; moreover, no memory constraints are present for N up to 100 000. Figure 4 (left) we plot the computational time of the three approaches for p = 1, n = 50, and different values of N . Even though these are the same results as those reported in Table 3, Fig. 4 (left) clearly shows the O(N 3 ) trend of the full SVD scheme versus the O(N 2 ) trend of the svds scheme versus the O(N ) behaviour of the proposed approach. Figure 4 (right) we depict, on a logarithmic scale, the running time of the proposed procedure for n = 50 and different values of N and p, clearly exhibiting a linear dependency on p and an N log N dependency with respect to N .

Example 3
In this example we compare the novel strategy presented in this paper to the fast Loewner SVD scheme illustrated in [18]. We consider the same data set as the one in Example 2, this time with S N R = 120 and a random D ∈ R p× p = 0. Due to the fact that the models resulting from the Loewner framework have D = 0, a realization of size n + p is needed to approximate the system with D = 0 [26,30].
In [18], a Galerkin-ADI method is applied to the Sylvester equation (11) satisfied by the Loewner matrix. At the k-th iteration, a low-rank approximation P k L k Q * k , P k , Q k ∈ C N ×k , L k ∈ Ck ×k , to L is thus computed. If U k S k V * k = L k denotes the SVD of L k , then the matrices P k U k and V * k Q * k can be used in place of X n and Y n in (15) to compute the reduced model. The method is stopped whenever the norm of the residual matrix M P k L k Q * k − P k L k Q * k − VR + LW, consisting of the left-hand side of the Sylvester equation with L replaced by its low-rank approximation P k L k Q * k , is smaller than a certain threshold ε. In the results that follow we employ ε = 10 −4 , as done in [18]. At each iteration step, the SVD of L k is truncated to keep only the n + p significant values.
We consider the Half&Half partition of the frequencies as this is the best scenario for the scheme coming from [18]. The Half&Half partition often leads to a rather fast convergence of the Galerkin-ADI method in terms of number of iterations so that a quite small approximation space is constructed. If different partitions were used, the Galerkin-ADI method could be equipped with a quite involved divide-and-conquer scheme; see [18]. On the other hand, as illustrated in Example 1, the Half&Half partition leads to higher values of the HSS-rank of C than for the Even&Odd partition with a consequent increment in the computational efforts of our scheme. In addition, as for [18], our tests employed complex arithmetic and did not solve the corresponding Sylvester equation (22) for real-coefficient matrices.
In Table 4 we report the results for p = 10, n = 50, and different values of N . Notice that even though the Galerkin-ADI approach efficiently computes the approximation spaces, the construction of the reduced model (15) still requires the allocation of both L and S. Therefore, also for the Galerkin-ADI scheme severe memory constraints hold and for N > 30 000, we are not able to allocate the L and S matrices with complex entries on the machine used for running the tests.
Even though the Galerkin-ADI approach is faster for N < 20 000, the computed approximation spaces are quite poor. Indeed, the computed reduced models are always 7 orders of magnitude less accurate than the ones constructed by our approach. The paper [18] validates the Galerkin-ADI scheme on a system with randomly generated poles for various orders n and number of samples N but does not mention the accuracy of the resulting models. Moreover, in terms of CPU time, our results are comparable to the ones in [18] when considering the computational time solely of the Galerkin-ADI iteration, disregarding the steps involving building the full matrices and projecting these to obtain the reduced model.
The remarkable difference in the accuracy attained by the two approaches make any sort of computational comparison rather pointless. However, we would like to point out that the computational time of the Galerkin-ADI approach grows quadratically with N due to the need to assemble and store the full Loewner and shifted Loewner matrices, while an N log N Table 4 Example 3. Number of iterations, computational time (in seconds) solely of the Galerkin-ADI iteration scheme together with the total time (including building the data, the full Loewner and shifted Loewner matrices and the projection step) as well as the H 2 -error achieved by the Galerkin-ADI approach. In comparison, we list the HSS-rank, the total time (in seconds) as well as the H 2 -error of the novel scheme presented in this paper for different values of N (number of samples), p = 10, and n = 50  dependency of the computational cost of our novel approach can be evidenced once again from the timings reported in Table 4. Several ideas could be implemented to improve the accuracy of the models obtained with the Galerkin-ADI approach. In order to have the fairest comparisons with respect to our novel approach, each of these ideas will be tested separately to explore all the possibilities to enhance the Galerkin-ADI approach from [18].
First, the tolerance ε for solving the Sylvester equation via Galerkin-ADI can be chosen to a value comparable to the noise level for an S N R of 120, namely ε = 10 −12 . Results are detailed in Table 5 only for the case N = 5 000, p = 10, and n = 50 as the trend is obvious from this one example. While the accuracy of the model has slightly improved with respect to results obtained for ε = 10 −4 , the number of iterations has also considerably increased, leading to matrices L k of much larger dimensions for which the SVD L k = U k S k V * k becomes costly. Hence, the CPU cost of the scheme has exploded and is no longer viable. In any case, even for a tolerance value close to the noise level, the accuracy of the model is several orders of magnitude worse than with our proposed technique (10 −3 versus 10 −9 ).
Second, it is always advisable to compute the projection subspaces from a linear combination of S and L, namely S − xL rather than only L, as the Loewner matrix L encodes the strictly rational part and the addition of S provides all the information on the system, including its polynomial part (the D-term). We apply the low-rank Galerkin-ADI method to the Sylvester equation fulfilled by S − xL thus computing a matrix P k Z k Q * k such that P k Z k Q * k ≈ S − xL. Results are detailed in Table 6 for the case ε = 10 −4 , N = 5 000, p = 10, and n = 50. For all instances considered, results were comparable in terms of CPU  time to those obtained when considering solely the Sylvester equation satisfied by L in the Galerkin-ADI iteration (listed in the first line of Table 6 for reference), while in terms of accuracy, they are slightly worse. For this example, the sole benefit of using a linear combination S − xL might be the system identification properties as, in principle, a sharp drop in the SVD of Z k reveals the degree of the underlying system. The third avenue worth exploring is employing real arithmetic and the corresponding Sylvester equations (22) and (23). Table 7 shows the results obtained using real arithmetic, both for the Galerkin-ADI scheme, as well as our proposed method. For reference, the first line in Table 7 lists the results previously obtained in complex arithmetic. For the method in [18], the cost of the scheme has mostly increased, due to more complicated Sylvester equations in (22) and (23). The CPU cost of building the data matrices, the full Loewner and shifted Loewner matrices, has also increased, yielding a total cost far superior to that obtained in complex arithmetic. In some instances, the accuracy has improved slightly. On the other hand, the real arithmetic causes the HSS-rank of the Cauchy matrix approximation to be much smaller with a remarkable impact on the CPU time and almost no effects on the model accuracy when using our novel approach.
We conclude this example by mentioning that the use of a hybrid approach may be fruitful. In particular, our novel approach can be employed to avoid storing the large and dense Loewner and shifted Loewner matrices. Then, the Galerkin-ADI scheme can be used Frequency response of the model (in black) and the measurements (in red) for N = 5 000, p = 10, and n = 50 using our proposed approach, Galerkin-ADI as in [18] and the hybrid approach, employing real arithmetic Fig. 6 Example 3. Error plots for N = 5 000, p = 10, and n = 50 using our proposed approach, Galerkin-ADI as in [18] and the hybrid approach, employing real arithmetic to compute the first dominant singular vectors of S − xL, instead of employing svds, thus also being able to identify the order of the underlying system. However, the accuracy will not be comparable to that of our proposed approach. We implemented this idea and list the CPU times of the various steps in Table 8 together with the resulting accuracy for Galerkin-ADI applied to solving the Sylvester equation (22) for L in real arithmetic with ε = 10 −4 for N = 5 000, p = 10, and n = 50. Plots of the responses of our proposed approach, together with the Galerkin-ADI scheme as proposed in [18] and the hybrid approach are shown in Fig. 5. Even though the general shape of the response is well captured, some resonances are not modeled accurately, as expected from the much higher model errors reported earlier. This can be noticed better from the error plots in Fig. 6.

Conclusion
By exploiting the Cauchy-like structure of the Loewner and shifted Loewner matrices, a novel strategy for reducing the computational costs and the memory requirements of the Loewner framework has been proposed. In particular, the use of the HSS-format leads to tremendous savings in the storage demand and computational efforts of the overall scheme. Indeed, except for the construction of C whose cost is polylogarithmic in N , both the memory requirements and the computational cost of iteratively performing the SVD now linearly depend on the cardinality of the considered data set. The success of our procedure strongly relies on the capability of representing the Cauchy matrix C in terms of an HSS-matrix C with low (α, β) rank of the off-diagonal blocks. Even though we restricted ourselves to showing how different, but common, partitions of the frequencies affect the HSS-rank of C, a thorough analysis of their connection may be beneficial. This interesting, but tricky study will need to take into account several and diverse aspects like the compressibility in the HSS format of the matrix C, the conditioning of L and S, and the approximation properties of the underlying partition of the frequencies.
We have always computed C at high accuracy. Results very similar to the one reported in the previous sections are obtained also with 10 −12 as low-rank truncation threshold. However, we believe that the employment of more inexact, and thus with a lower rank, HSS-representations of C and its effects on the accuracy of the overall scheme may be another interesting research direction which is worth pursuing depending on the application at hand.
The strategy presented in this paper can be applied to more sophisticated problems as long as the Loewner and shifted Loewner matrices maintain a Cauchy-like structure. In particular, our approach can be employed with minor modifications in model order reduction of parametrized [21], linear switched [16], and bilinear systems [1].
Funding Open Access funding enabled and organized by Projekt DEAL.

Conflicts of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.

Ethical approval
The research presented in this paper is based upon work supported by the National Science Foundation under Grant No. DMS-1439786 while both the authors were in residence at the Institute for Computational and Experimental Research in Mathematics (ICERM) in Providence, RI, during the Model and Dimension Reduction in Uncertain and Dynamic Systems program. Even though the second half of the program had to be performed virtually due to the restrictions caused by the COVID-19 pandemic, we are extremely grateful to the organizers of the program and the whole staff of ICERM for doing whatever possible to maintain an exciting, fruitful, and high-quality working environment. The datasets and algorithms generated during and/or analysed during the current study are available from the corresponding author on reasonable request. Moreover, the approach presented in this paper will be included in the hm-toolbox in the near future.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.