Data-dependent orthogonal polynomials on generalized circles: A unified approach applied to $\delta$-domain identification

The performance of algorithms in system identification and control, depends on their implementation in finite-precision arithmetic. The aim of this paper is to develop a unified approach for numerically reliable system identification that combines the numerical advantages of data-dependent orthogonal polynomials and the discrete-time δ-domain parametrization. In this paper, earlier results for discrete orthogonal polynomials on the real-line and the unit-circle are generalized to obtain an approach for the construction of orthogonal polynomials on generalized circles in the complex plane. This enables the formulation of a unified framework for the numerically reliable identification of systems expressed in the δ-domain, as well as in the traditional Laplace and Z-domains. An example is presented which shows the significant numerical advantages of the δ-domain approach for the identification of fast-sampled systems.


Introduction
Numerically robust and accurate implementation of algorithms used in system identification and control is essential for their successful application [1,2]. This numerical aspect becomes more relevant and challenging as the system complexity increases [3,4]. Several approaches that address the encountered numerical issues have been proposed. In particular, the use of the discrete δdomain as opposed to the classical Z-domain addresses several numerical issues in digital control implementations with fast sampling [5]. In the present paper, numerically reliable identification is investigated and a unified framework is presented that, as a special case, enables the numerically reliable identification of system models expressed in the discrete δ-domain.
In parametric system identification, the central computational step typically involves solving a linear least squares problem, which often is severely ill-conditioned [6][7][8][9]. Several partial solutions to this conditioning problem have been proposed, including the use of frequency scaling [10], the use of orthonormal bases such as Chebyshev polynomials, or Laguerre or Kautz filters [8,11], and the use of certain rational basis functions [12][13][14]. Another recent development is the use of polynomial basis functions, which are orthogonal with respect to a discrete data-dependent inner product [15][16][17]. Using these data-dependent orthonormal polynomials optimal Email address: r.j.voorhoeve@gmail.com (Robbert Voorhoeve).
numerical conditioning, i.e., a condition number κ = 1, can be achieved [16,17]. Thereby essentially solving the main numerical bottlenecks and enabling the identification of highly complex systems [18,19]. Efficient algorithms, which are linear in the data and the polynomial degree, i.e., O(mn), exist for the construction of these data-dependent orthogonal polynomials for continuous time systems, i.e., for s = jω, with nodes on the imaginary axis [20]; and for discrete time systems, i.e., for z = e jωTs , with nodes on the unit circle [21].
Another improvement of numerical aspect for algorithms in system identification and control can be achieved by replacing the traditional forward-shift operator, q, with the forward-difference operator, δ [22,23]. The δ-operator transparently connects models in the discrete time domain and the continuous time domain, as the continuous time model parameters are recovered in the limit case for the sample time tending to zero [22]. Furthermore, using the δ-operator instead of the shift operator often leads to improved numerical performance when considering finite-word-length effects [23]. These advantages have been shown to be especially relevant for systems with fast sampling [24,5], i.e., where the sampling frequency is significantly higher than the dominant system dynamics, which is often the case in identification for control [25]. In controller synthesis, a δ-domain parameterization has also shown to yield substantial numerical advantages [26][27][28][29].
Although data-dependent orthonormal polynomials have been shown to provide significant numerical ad-vantages in system identification, at present their advantages are limited due to a numerical loss of accuracy in other essential computation steps. The aim of this paper is to develop a unified approach for numerically reliable system identification that combines the numerical advantages of data-dependent orthogonal polynomials and the discrete-time δ-domain parametrization. The main contributions of this paper are the following.
(1) A theoretical framework and efficient construction algorithm for polynomials that are orthogonal with respect to discrete measures supported on generalized circles in the complex plane, expanding on earlier results for measures supported on the real-line, and the unit circle. In earlier approaches, e.g., [16,30], efficient construction of data-dependent orthogonal polynomials is limited either to the case of measures supported on the real or imaginary line [20] or on the unit circle [21]. In contrast, for the relevant case of the δ operator, the nodes lie on a circle in the complex plane for which the existing efficient solutions do not apply. The specific δ-domain case is considered in [31]. In the present paper, the preliminary results in [31] are extended by considering the construction of orthogonal polynomials from spectral data on generalized circles in the complex plane. This leads to a unified framework where identification in Laplace domain, Z-domain, and δ-domain can be considered as special cases and can be solved with a single algorithm.
The outline of this paper is as follows. In Section 2, the problem of numerically reliable identification of fast sampled systems is formulated. Section 3 provides a concise overview of the theory of data-dependent orthogonal polynomials. In Section 4, the theory and construction algorithm for data-dependent orthogonal polynomials on generalized circles in the complex plane are considered, constituting contribution 1 of this paper. In Section 5, this algorithm is used to obtain a unified framework for numerically reliable system identification, constituting contribution 2 of this paper. In Section 6, the results for a simulation example are presented, showing the superior performance of the δ-domain approach compared to the Z-domain approaches for fast-sampled systems, constituting contribution 3 of this paper. In Section 7, conclusions and an outlook on ongoing work are given.

Problem formulation
The problem considered in this paper is that of numerically reliable frequency-domain identification of li-near systems with fast sampling. First, the problem of frequency domain system identification is defined. Second, numerically reliable identification is considered. Last, the numerical challenges for fast-sampled systems are highlighted and the advantages of the proposed δdomain approach are illustrated, leading to the formulation of the considered problem of numerically reliable identification in the δ-domain.

Frequency domain system identification
In system identification, a physical process is considered which is observed using a digital measurement environment. The goal in system identification is to estimate an appropriate system model,Ĝ(ξ), describing the relevant behavior of this physical process. In frequency domain identification this is done using frequency domain inputoutput data of the system. To facilitate the presentation, identification of single-input single-output (SISO) systems is considered, the extension to multiple-input multiple-output (MIMO) systems follows along similar lines as in, e.g., [18,19].
In this paper, linear time invariant (LTI) systems are considered, represented by real-rational transfer functions,Ĝ(ξ) ∈ R. Here, ξ, is an indeterminate frequency variable which depends on the identification domain. For continuous-time modeling the Laplace domain, where ξ = s = jω, is traditionally used, while discrete-time modeling is generally performed in the Z-domain, with ξ = z = e jωTs . Here ω is the frequency variable and T s is the sampling time of the digital measurement environment. The modelĜ(ξ) is parametrized aŝ wheren(ξ, θ),d(ξ, θ) ∈ R[ξ], which are linearly parametrized with respect to a set of polynomial basis functions {φ j (ξ)} n j=0 , i.e., where y i is determined by the problem data and the parameter constraints. Therefore, the problem that is considered in this paper is of this form, where for clarity only the scalar polynomial case is considered, i.e., p(ξ, θ) ∈ R[ξ]. Extensions to the vector polynomial case, i.e., p(ξ, θ) ∈ R n [ξ] are discussed in Section 7.

Numerically reliable identification
Solving the polynomial least-squares problem, (10), is equivalent to determining the least-squares solution to with which depends on the weights in W and the basis functions in Φ n . The key observation is that the matrix W Φ n in (15) can be severely ill-conditioned depending on the choice of basis functions. Indeed, it is well-known that if a monomial basis, φ j (ξ) = ξ j , is used, then Φ n is a Vandermonde matrix, i.e., which is often ill-conditioned, deteriorating the performance of the identification algorithms [17].
Several approaches have been proposed in literature to mitigate this conditioning problem [8,[10][11][12][13][14], confirming that this is an important aspect in frequency domain system identification. The approaches in [8,[11][12][13][14] focus on a change of basis functions to obtain a Φ-matrix which has better numerical properties than the Vandermonde matrix in (19). However, this does not guarantee an improvement in the conditioning of W Φ, which is the relevant problem matrix for solving (15). To guarantee that W Φ is well-conditioned, the problem data, as contained in W , needs to be taken into account in the choice of basis functions, i.e., the basis functions should be datadependent. A key result in, e.g., [16,30], shows that optimal conditioning of W Φ can indeed be achieved using a set of polynomials which are orthonormal with respect to a judiciously chosen data-based inner product.
Being able to achieve optimal conditioning of the problem matrix W Φ is an important step towards obtaining a numerically reliable identification approach. However, other computational steps involved in the identification algorithm should also be performed numerically accurately. For systems with fast sampling inherent numerical challenges exists which need to be addressed.

Numerical challenges for fast-sampled systems: motivation for a δ-domain formulation
In the identification problem as stated in Section 2.1, ξ in (1) still remains to be specified. Typically, the Laplace domain, ξ = s, is used when the system is in continuous time, whereas the Z-domain, ξ = z, is used when the system is in discrete time. For both these cases, efficient algorithms exist to construct the data-dependent orthogonal polynomials used in numerically reliable identification, as described in Section 2.2. However, when the sampling frequency of the digital measurement environment is fast relative to the dominant dynamics of the underlying physical process, the discrete Z-domain description is known to suffer from numerical issues.
To analyze the numerical issues that arise for fastsampled systems in the Z-domain, the derivative of the complex variable z = e s/fs with respect to the sampling frequency f s = 1/T s is considered, where s = f s ln(z) is used in the third step and where ∝ denotes proportionality. Streamlines for the vector field −z ln(z) are plotted in the complex plane in Figure 1. This figure clearly shows that all streamlines converge to the point z = 1. For fast-sampled systems this essentially means that the poles describing the system dynamics will become concentrated near the point z = 1.
This concentration around z = 1 leads to a numerical loss of significance in finite precision arithmetic since a large part of the available finite wordlength is used to store this "unity part" of the relevant parameters, which does not contain any system information. Also, cancellation errors occur when parameters with values that are only marginally different from 1 are subtracted from one another.
As an alternative to the Z-domain description, the discrete-time system can be represented in the δ-domain where the δ-operator is used instead of the shift operator q, where and the corresponding transform variable, 4 The main numerical advantage of this δ-domain description is that it shifts the point z = 1 to the origin, solving the associated loss of significance problems. Additionally, the δ operator intuitively connects the discrete and continuous time domains as for a differentiable function meaning that as the sampling frequency increases the δdomain parameters converge to the continuous time parameters. This provides insight and gives confidence that even for sampling time approaching zero the discretetime description in the δ-domain remains well behaved.
The δ-domain representation often improves numerical aspects. However, fast construction of data-dependent orthonormal polynomials as used in, e.g., [16,30], has not been investigated for δ-domain identification. In this paper, the problem of efficient construction of datadependent orthonormal polynomials for numerically reliable δ-domain identification is considered.

Data-dependent orthonormal polynomials
In this section, orthogonal polynomials with respect to a data-dependent discrete inner product are defined. The study of orthogonal polynomials is a well-developed and active field, see, e.g., [35][36][37]. In this paper, the focus is on discrete inner products in contrast to the commonly considered continuous inner products involving integrals. In addition, an arbitrary weighting is used in the inner product, see also [38,Chapter 12] for details on such forms, and [39] for related indefinite and asymmetric forms.
In Section 3.1, discrete orthogonal polynomials are defined. Next, a Hessenberg recurrence relation for this set of polynomials is considered in Section 3.2, and the inverse eigenvalue problem linking the Hessenberg-recurrence matrix to the problem data is established in Section 3.3. Next, an algorithm for solving this inverse eigenvalue problem is considered in Section 3.4, as well as the special cases when all the nodes lie on the real-line, the imaginary-axis or the unit circle in Section 3.5. In this section and in Section 4, the general case of polynomials with complex coefficients is considered. In Section 5, the real-polynomial case, which is relevant for identification, is considered.

Discrete orthogonal polynomials
Consider the set of polynomials {φ j (ξ)} k j=0 where φ j (ξ) ∈ C[ξ], and deg φ j (ξ) = j, which is orthogonal with respect to a discrete inner-product defined by the nodes ξ 1 , ξ 2 , . . . , ξ m ∈ C and weights w 1 , w 2 , . . . , w m ∈ C, i.e., with c p ∈ R + , and where δ pq is the Kronecker delta function, and x * denotes the complex conjugate of x. In matrix form (24) is given by with W as defined in (16), Φ k as defined in (18) but with k columns instead of n, and where D = diag(c 0 , c 1 , . . . , c k ). Furthermore, the polynomials φ j are normalized such that c j = 1 ∀ c j = 0, j = 0, . . . , k.
Without loss of generality, it is assumed that w i = 0 ∀ i. Indeed, if w i = 0, the corresponding node-weight pair can be removed from (24) without altering the inner product. Therefore, rank(W ) = m and consequently rank(D) ≤ m, due to (25). This means there are at most m polynomials for which c j = 0. In fact, under the assumption that there are no duplicate nodes, the first m polynomials are the ones for which c j = 0. This follows from the fact that the square matrix Φ m−1 , containing these first m polynomials, is bijectively related to the square Vandermonde matrix defined by the nodes ξ i , which has a nonzero determinant as long as all the nodes are distinct [40]. Therefore, meaning the matrix W Φ m−1 is unitary, and thus κ(W Φ m−1 ) = 1, which is precisely the goal in numerically reliable identification as is described in Section 2.2. Furthermore, it follows from (25)-(26) that, for k > m−1, the last k−(m−1) columns of Φ k are all equal to zero, i.e., the polynomials φ j (ξ) for j > m − 1 have roots that coincide with all the nodes ξ 1 , ξ 2 , . . . , ξ m . This property is used in the following section to relate the set of polynomials to an upper-Hessenberg matrix containing recurrence coefficients.

Hessenberg recurrence matrix
Due to the strict degree constraint deg φ j (ξ) = j, the polynomials {φ j (ξ)} k j=0 are all linearly independent. As a consequence, {φ j (ξ)} k j=0 form a basis for the space of all polynomials of degree less than or equal to k [38, Chapter 12]. In turn, this implies that the polynomial ξφ k−1 (ξ) can be written as a linear combination of these polynomials, i.e., Given the recurrence coefficients, h j,k , and φ 0 , this relation can be used to recursively compute the full set of polynomials by rearranging (27) as hence (27) and (28) are recurrence relations. Equation (27) can be rewritten in matrix notation as which, when evaluated at the nodes ξ 1 , ξ 2 , . . . , ξ m that define the inner product (24), yields with X = diag(ξ 1 , ξ 2 , . . . , ξ m ). For k = m, the last column of Φ m is equal to zero, as shown in Section 3.1, meaning (30) can be rewritten as, Here, H m×m is an upper-Hessenberg matrix containing the recurrence coefficients which, in conjunction with φ 0 , can be used to compute {φ j (ξ)} m−1 j=1 .

Inverse eigenvalue problem
To relate nodes and weights defining the inner product (24) to the Hessenberg recurrence matrix, (31) is premultiplied by W , where Φ = Φ m−1 and H = H m×m are used to simplify notation. Note that W and X commute due to their diagonal structure, hence (32) equals Since it follows from (26) that Q = W Φ is unitary, it follows from (33) that revealing that H is unitarily similar to X. Also, Qe 1 , the first column of Q = W Φ is equal to with e 1 = [1 0 · · · 0] T , |α| = 1, and, w as defined in (11). The problem (34)- (35) is an inverse eigenvalue problem, where the desired Hessenberg recurrence matrix H has the nodes ξ i as eigenvalues and the normalized weight vector w as the first eigenvector.
The inverse eigenvalue problem as given by (34)-(35) is not uniquely defined since each column of Q can be multiplied by a complex number of magnitude one without violating orthonormality. As an additional constraint, the sub-diagonal elements of H, as well as α in (35), are constrained to be positive real, thereby uniquely defining Q and H [38, Chapter 12]. To summarize, the inverse eigenvalue problem considered here is defined as follows.
When Problem 1 has been solved, the recurrence relation (28) can subsequently be used to compute the orthonormal polynomial basis {φ j (ξ)} n j=0 . The polynomial linear least-squares problem (10) is then solved bŷ with Q = W Φ, and W, Φ, and y as given by (16)- (17).

Update algorithm: chasing down the diagonal
To solve the inverse eigenvalue problem of Problem 1, a unitary matrix Q has to be determined such that with h k+1, k ∈ R >0 .
An efficient method to solve this problem, as used in, e.g., [41], is to consider the update step, i.e., the problem of introducing a new node-weight pair to the problem which has been solved for the previously introduced node-weight pairs. This update step can be solved using Givens rotations, which are defined here as where |c| 2 + |s| 2 = 1, a, b, c, s ∈ C and r ∈ R + [42].
Here, the phase of r is constrained such that it is positive real, whereas conventionally Givens rotations are defined such that c ∈ R + . This choice reflects the choice to constrain the sub-diagonal elements of the Hessenberg recurrence matrix H to be positive real, as will be seen 6 in the remainder of this section. To simplify the notation in the remainder of this section, a Givens rotation applied to two rows and/or columns is denoted as When working with similarity transformations, as is the case in this paper, these transformations are always applied to two sides of the matrix. Here, the notation is used to denote the columns to which the Givens rotation is applied.
To perform the update step, a new node-weight pair , which has been brought to the desired form of (38) under unitary similarity. Here where The matrix (41) is transformed to the form of (38) under unitary similarity, using Givens rotations. This is done by chasing the bulge element, i.e., the non-zero element which violates the desired matrix structure, down the diagonal using the following algorithm.
Algorithm 4 (Chasing down the diagonal) A single update step is performed by first adding a new nodeweight pair to obtain the augmented matrix as in (41). This matrix is then transformed to the form [ρ i e 1 | H o,i ] under unitary similarity by the following operations. First, the second element in the weight vector is zeroed to transform it to the desired form of ρ i e 1 , i.e., (43) This induces a non-zero element, denoted by , as the (3, 1) element ofH i , violating the Hessenberg structure. This element is subsequently zeroed by performing another Givens rotation which effectively moves the bulge element one step down the diagonal. This operation is repeated until the bulge element has been chased all the way down the diagonal where a final zeroing operation no longer introduces a new bulge element, i.e., Note that this sequence of Givens rotations, with the Givens rotations defined as in (39) with r ∈ R + , directly leads to ρ i ∈ R + and h k+1, k ∈ R + . This update step needs to be performed for i = 1, 2, · · · , m, to add each node-weight pair and, in each update step, i − 1 rotations are performed operating on both columns and rows with at most i nonzero elements. This means the total computational cost of solving the inverse eigenvalue problem is O(m 3 ). In the case of polynomial least-squares problems where the orthonormal polynomial basis only needs to be built up to a limited polynomial degree n, the computational cost becomes O(mn 2 ), since only the first n rows and columns of the Hessenberg recurrence matrix need to be computed [38, Section 12.5].

Special cases: matrix structure leading to an O(n) reduction in computational complexity
In the special case that all nodes ξ i lie on the real line, the imaginary axis or the unit circle, the computational complexity of the inverse eigenvalue problem, Problem 1, can be reduced such that an update can be computed in O(n) operations instead of O(n 2 ). This is a result of additional structural properties of the Hessenberg recurrence matrix H which enable the Givens chasing operations described in Section 3.4 to be computed efficiently and also lead to a simplified recurrence relation (27).

Nodes on the real-line and imaginary axis
For the classical situation where ξ i ∈ R, the recurrence matrix H is Hermitian, since the diagonal node matrix satisfies X H = X and therefore This Hermitian Hessenberg matrix is tridiagonal and can be fully described using O(n) parameters, {a i } n i=0 and Furthermore, applying a Givens rotation to a tridiagonal matrix is an operation with a computational complexity of O(1), where it is O(n) for a full Hessenberg matrix, leading to the reduction of the computational complexity of the update algorithm described in Section 3.4 from O(n 2 ) to O(n). Also, it is immediate from (47) that the n-term recurrence relation (27) reduces to a three term recurrence relation, For the case where all ξ i lie on the imaginary axis a similar situation occurs where the Hessenberg matrix is skew-Hermitian instead of Hermitian.

Nodes on the unit circle
In the case that all nodes ξ i lie on the unit circle, the complex conjugate of each node is equal to its inverse since ξ −1 i = ξ * i |ξi| 2 , and |ξ i | = 1 for nodes on the unit circle. This means that for the diagonal node matrix which leads to meaning the Hessenberg recurrence matrix is unitary. This unitary Hessenberg matrix can again be fully described using O(n) parameters, {γ i } n i=0 and {σ i } n i=1 , however this description is not sparse as in the tridiagonal case, here where The key property of unitary Hessenberg matrices that leads to a reduction of the computational complexity is that unitary Hessenberg matrices can be written as a product of elementary unitary factors with where ⊕ denotes the direct sum, i.e., The representation (52) of unitary Hessenberg matrices is known as the Schur parametrization. As a result of this parametrization, the application of a single Givens rotation again has a computational complexity of O(1), due to the fact that only factors G k−1 , G k and G k+1 are affected by an elementary similarity transformation operating on rows/columns k and k + 1.
The unitary Hessenberg structure also leads to simplified recurrence relations. Indeed, these can be written as a two-term recurrence relation with a set of auxiliary polynomials {φ(ξ)}, which is known as the Szegö recurrence relation. When γ k = 0 ∀ k, these relations can be rewritten as a threeterm recurrence relation further highlighting the similarity between the cases with nodes on the unit circle and with nodes on the real-line or imaginary axis as described in Section 3.5.1.

Towards a unified approach
In Sections 3.1-3.4, the theoretical framework of datadependent orthonormal polynomials is established. It is shown that by solving an inverse eigenvalue problem, a Hessenberg recurrence matrix is constructed which uniquely determines the polynomial basis that is orthonormal with respect to the data-dependent inner product.
In Section 3.5, it is shown that there are additional structural properties for the Hessenberg recurrence matrix when the nodes that define the data-dependent inner product all lie on either the real-line, the imaginary axis or the unit circle. These structural properties lead to efficient construction algorithms for the discrete orthogonal polynomials, enabling the formulation of efficient and numerically reliable system identification approaches in both the continuous time Laplace domain, with nodes on the imaginary axis, and the discrete time Z-domain, with nodes on the unit circle.
From a system theoretical perspective, it indeed makes sense that similar structural properties exists for the Hessenberg recurrence matrix both in problems expressed in the continuous time Laplace domain and problems expressed in the discrete time Z-domain, since these are closely related. Indeed, through the use of invertible bilinear transformations several discrete time problems can be tackled with tools from continuous time system theory and vice versa, see, e.g., [43].
The general form of these bilinear transformations, known as Möbius transformations, is given bỹ with α, β, µ, η ∈ C and αη − βµ = 0. The full class of geometries that is related to the real-line and unit circle through Möbius transformations is the class of generalized circles, i.e., circles and lines, in the complex plane.
In the following section, the Möbius transformation is exploited to generalize the structural properties of the Hessenberg recurrence matrix that lead to a reduction in computational complexity to the case where all nodes lie on a generalized circle in the complex plane, including the real-line and unit-circle as special cases. This enables the fast construction of data-dependent orthonormal polynomials in the δ-domain without first transforming the problem to an equivalent Z-domain or sdomain formulation.

Orthogonal polynomials on generalized circles: a unified framework
In this section, the theoretical framework and algorithm are presented for the efficient construction of orthogonal polynomials with nodes on generalized circles in the complex plane. To extend the results of the real-line and unit circle cases, as described in Section 3.5, to generalized circles, their geometrical properties are investigated first in Section 4.1. Second, in Section 4.2 the structural properties of the Hessenberg recurrence matrix are investigated, resulting in a generalization of the structural properties as encountered in the real line and unit-circle cases for the generalized circle case. Next, in Section 4.3 it is shown how this generalized structure for the Hessenberg recurrence matrix again leads to an O(n) reduction in computational complexity as formulated in Theorem 14, this constitutes the first main result of this paper. Lastly, in Section 4.4 the construction algorithm is formulated in Algorithm 5, constituting the second main result of this paper.

Geometrical properties of generalized circles
The geometric properties of generalized circles are first defined and investigated, with a particular emphasis on their relation to the known real-line and unit circle cases. First, generalized circles are defined.
Definition 2 (Generalized circles) A generalized circle is a set of points ξ in the complex plane that satisfies the following equation for any a, d ∈ R and c, b ∈ C, such that c = b * , and ad < bc, [44, Section 1.1].
Generalized circles are circles and straight lines in the complex plane and are therefore alternatively called clines or circlines. The general relation between two generalized circles is the Möbius transformation, (60), as defined in Section 3.6. In this section, a simplified version of the Möbius transformation is considered where no inversion is used, i.e., where µ = 0, and η = 1. This simplified Möbius transformation can be considered as a composition of the following simple geometrical transformations: translation, homothety, and rotation [44]. In the following lemma it is shown that all generalized circles can be transformed to the real-line or unit circle using this simplified Möbius transformation.

Lemma 3 A set of nodes {ξ
lies on a generalized circle in the unit plane if and only if ∃ c 1 ∈ C \ {0}, and c 2 ∈ C, such that,ξ PROOF. Applying transformation (62) to (61) yields, When a = 0, taking c 2 = − b a and c 1 = bc which corresponds to ξξ * = 1, i.e., the unit circle. When a = 0, taking c 2 = − d 2 and c 1 = jc yields which corresponds to ξ = ξ * , i.e., the real-line. The converse also holds since the transformation ξ →ξ defined by (62) is invertible. 2 Next, matrices that are unitarily similar to a diagonal node-matrix with nodes on a generalized circle are defined in the following as circline matrices.

Definition 4 (Circline matrices)
A matrixŨ is referred to as a circline matrix if it is unitarily similar to a diagonal matrixX = diag(ξ 1 ,ξ 2 , . . . ,ξ m ), i.e., with Q a unitary matrix, where the nodes {ξ i } m i=1 lie on a generalized circle in the complex plane.
This class of matrices is important as it is the class of matrices encountered in the solution of the inverse eigenvalue problem (Problem 1) when the nodes lie on a generalized circle in the complex plane. The following Lemma shows the relation between the class of circline matrices and the unitary or Hermitian matrices encountered in the solution of the inverse eigenvalue problem in the unit circle and real line cases.

PROOF. From Definition 4,
it follows from Lemma 3 that ∃ c 1 ∈ C \ {0}, and c 2 ∈ C, such that,X = c 1 X + c 2 I, (69) with X = diag(ξ 1 , ξ 2 , . . . , ξ m ) a diagonal node matrix with ξ i ∈ R ∀ i, or |ξ i | = 1 ∀ i. Substituting (69) into (68) yieldsŨ with U = Q H XQ, as scalar-matrix multiplication is commutative and Q H IQ = I. 2 In Lemma 3 it is shown that a simple shifting and scaling operation (62) can be used to transform any generalized circle to either the real-line or the unit circle. Then, Lemma 5 essentially shows that this shifting and scaling operation commutes with unitary similarity transformations. As a result of this any circline matrix, as defined in Definition 4 and as encountered while solving the inverse eigenvalue problem (Problem 1) in the generalized circle case, can be written as a shifted and scaled version of a unitary or Hermitian matrix as are encountered in the real-line or the unit circle cases. This relationship between the matrices encountered in the generalized circle case and the matrices encountered in the real-line and unit circle cases is used in the next section to show that the structural properties of the Hessenberg recurrence matrix that lead to a reduction in computational complexity in the real-line and unit circle cases can be extended to the generalized circle case.

Structural properties of the Hessenberg recurrence matrix
To generalize the structural properties of the Hessenberg recurrence matrix, a class of structured matrices is investigated which contains both the tridiagonal matrices encountered in the real-line and imaginary axis cases as well as the unitary Hessenberg matrices encountered in the case with nodes on the unit circle. This class of matrices is called quasiseparable matrices [45], which is a type of rank-structured matrix. More specifically, the class of matrices investigated here is the class of Hessenberg-quasiseperable matrices with a separability rank of one, i.e., (H, 1)-quasiseparable matrices. Using the notation A(i 1 : i 2 , j 1 : j 2 ) to denote the sub-matrix of A containing rows i 1 to i 2 and columns j 1 to j 2 , these (H, 1)-quasiseparable matrices are defined as follows.
Definition 6 ((H, 1)-quasiseparable matrices) An upper Hessenberg matrix H is (H, 1)-quasiseparable if, The value of the rank constraint in (71), for all the blocks above the main diagonal, is known as the upper separability rank. This is defined here along with the lower seperability rank as follows.
Definition 7 (Separability rank) Let A ∈ C n×n , then A is quasiseparable with upper separability rank k u , and lower separability rank k l if The tridiagonal matrices encountered in the real-line case, i.e., (47), are indeed (H, 1)-quasiseparable as in this case the upper diagonal blocks are given by, which has one nonzero element and consequently its maximal rank is equal to one, as required in Definition 6. For the unitary Hessenberg matrices encountered in the unit-circle case, i.e., (51), the upper diagonal blocks are written as, where, with σ (x:y) = y i=x σ i . This shows that these upper diagonal blocks also have a maximal rank of one, showing that a unitary Hessenberg matrix is (H, 1)-quasiseparable.
Another property that is shared by both the unitary matrices encountered in the unit circle case and the Hermitian matrices encountered in the real-line case is that they are rank symmetric, as is shown in the following lemma.

Lemma 8 (Rank symmetry) Let U be a unitary or
Hermitian n×n matrix. Then if U is lower quasiseparable with separability rank k it is also upper quasiseparable with the same rank.
For a proof see [46,Theorem 5]. In the following Theorem it is shown that this rank symmetry property can be extended to all circline matrices.
Theorem 9 (Circline rank symmetry) LetŨ ∈ C n×n be a circline matrix. ThenŨ is rank symmetric, i.e., ifŨ is lower quasiseparable with separability rank k it is also upper quasiseparable with the same rank.
PROOF. Lemma 5 shows thatŨ can be written as where U = Q H XQ is a unitary or Hermitian matrix. As the quasiseperable rank-structure excludes the main diagonal, both the lower and the upper separability rank ofŨ is equal to that of c 1 U . Therefore, as U is rank symmetric as a result of Lemma 8,Ũ is rank symmetric. 2 Using this rank symmetry property, the following theorem shows that the class of Hessenberg recurrence matrices involved in the construction of a set of polynomials orthogonal with respect to a discrete inner-product with support on a generalized circle in the complex plane is indeed (H, 1)-quasiseparable.
Theorem 10 LetH ∈ C n×n be a Hessenberg matrix that is also a circline matrix. ThenH is (H, 1)quasiseparable.
PROOF. AsH is a Hessenberg matrix, its lower separability rank h l = 1, as all blocks from below the main diagonal contain at most a single element. AsH is also a circline matrix, it is rank symmetric as a result of Theorem 9, therefore its upper separability rank h u = h l = 1, meaningH is a (H, 1)-quasiseparable matrix. 2 This key theoretical result shows that the Hessenberg recurrence matrices encountered in the generalized circle case indeed have the (H, 1)-quasiseparable structure. In the following section it is shown that this result, in conjunction with the rank symmetry result of Theorem 9, enables the efficient construction of orthogonal polynomials on generalized circles in the complex plane.

Efficient O(mn) construction of orthogonal polynomials on generalized circles
In the following section it is shown that the (H, 1)quasiseparable matrix structure leads to a O(n) simplification of the construction algorithm for the orthonormal set of polynomials. For such an O(n) simplification to exist the following requirements must be met. The following lemma shows that an (H, 1)-quasiseparable matrix can be represented using O(n) generators.
Furthermore, as shown in the following lemmas this generator description leads to the following simplified recurrence relations for the related polynomial systems.
Lemma 12 ((H, 1)-quasiseparable 2-term r.r.) Let H ∈ C n×n be a (H, 1)-quasiseparable matrix described by a set of generators q i , d i , g i , h i , b i , i = 1, . . . , n as in (78). Then the set of polynomials {φ k (ξ)} n k=0 related to H through the n-term recurrence relation (29) can be obtained by the following 2-term block recurrence relation, where the conversion from the quasiseparable generators to the recurrence relation coefficients is given in Table 1.

Lemma 13 ((H, 1)-quasiseparable 3-term r.r.)
Let H ∈ C n×n be a (H, 1)-quasiseparable matrix described by a set of generators . . , n as in (78), with h i = 0 ∀ i. Then the set of polynomials {φ k (ξ)} n k=0 related to H through the n-term recurrence relation (29) can be obtained by the following 3-term recurrence relation, where the conversion from the quasiseparable generators to recurrence relation coefficients is given in Table 1.
Finally, the following theorem shows that the inverse eigenvalue problem can be solved with a reduced computational complexity when all nodes lie on a generalized circle. This theorem is the main result of this paper as it shows that the data-dependent orthonormal polynomials can be efficiently constructed in the generalized circle case.
Theorem 14 Let the nodes {ξ i } m i=1 all lie on a generalized circle in the complex plane. Then the inverse eigenvalue problem as defined by Problem 1 can be efficiently solved with a computational complexity of O(mn). Table 1 Conversion formulas from quasiseparable generators to recurrence relation coefficients.

Two-term block recurrence coefficients
Three-term recurrence coefficients PROOF. All operations performed in Algorithm 4 are unitary similarity transformations. Therefore in all steps of the algorithm the matrix being operated on is a circline matrix as defined in Definition 4, sincẽ with Q a unitary matrix andX = diag(ξ i ) where the nodes {ξ i } m i=1 all lie on a generalized circle in the complex plane. As a result of Theorem 9 the matrixH is rank symmetric. When performing an update step, the problem matrix is Hessenberg apart from the one bulge element on the second subdiagonal that is being chased down. This bulge element does not increase the lower separability rank of the matrix as any matrix taken from below the main diagonal has either on of the following three structures which are all rank 1. Therefore, as a result of rank symmetry, the upper separability rank ofH is also equal to one in all steps of the chasing algorithm. Performing a single chasing operation on the problem matrix thus means performing a Givens rotation on a matrix of the following structure, where is used to denote the rank structured part of the matrix, i.e., any block out of this part of the matrix has a rank of one. Due to this rank structure, Performing a Givens rotation on the right 2 × n part of the matrix, enclosed by the blue dash-dotted line, and top n × 2 part, enclosed by the red dashed line, can be done with a computational complexity of O(1) instead of the general complexity O(n). As a result, Problem 1, can be solved with an overall computational complexity of O(mn) instead of O(mn 2 ) for the case where all nodes lie on a generalized circle in the complex plane. 2 In the following section an efficient chasing down the diagonal update algorithm using the (H, 1)-quasiseparable matrix structure is detailed.

Update algorithm: chasing down the diagonal for
(H, 1)-quasiseparable matrices For notational simplicity the so-called well-free case is considered here where h i = 0 ∀ i, in which case it is possible to take h i = 1 ∀ i without loss of generality, see, e.g., [49]. This property generally holds in the context of system identification due to the stochastic nature of the identification data. In the following algorithm it is detailed how a single chasing operation of the chasing down the diagonal algorithm in Section 3.4 is performed for a (H, 1)-quasiseparable matrix with a computational complexity of O(1).

Algorithm 5 (Chasing for (H, 1)-quasiseparable)
A single chasing operation of the update algorithm of Section 3.4 is shown below acting on a (H, 1)quasiseparable matrixH using the generator description of Lemma 11, The chasing operation performed on the middle part of H , enclosed by the green dotted line in (86), lead to the following new values of the generators, denoted by the subscript n, . . .
leading to the following new values of the generators, Finally, for the right part, enclosed by the blue dashdotted line in (86), the following factorization can be made (92) leading to the following new values of the generators, b i+2,n g i+2,n ← 1/g i+1,n 0 0 1 After the replacements of the old generators with the new generators, i in (86) is increased by one and the same operation is performed again to chase the bulge one more step down the diagonal.
Here (91) is obtained from the fact that the factor b i /g i which appears in the column vector of the factorization in (88) must remain constant. Furthermore, the (2, 4)element of the matrix in (86), which is not operated on, should also remain constant. This leads to the following additional condition that should be checked for Algorithm 5, g i,n b (i+1:i+2),n = g i b (i+1:i+2) .
(94) For the case of nodes on a generalized circle, this condition is guaranteed to hold due to the rank symmetry of circline matrices.
In the following section, this algorithm is used to obtain a unified framework for numerically reliable identification in the Laplace domain, the Z-domain and the δ-domain.

Identification approach and implementation aspects
In this section the results from Section 4 are applied to obtain a unified framework for numerically reliable system identification.

Unified identification approach
The unified identification approach as proposed in this paper is summarized in Figure 2. In particular, it is shown how, starting with FRF data of a given system, a system model is identified using a class of well-known identification algorithms in combination with numerically optimal polynomial basis functions expressed in either the Laplace domain, the Z-domain or δ-domain.  Remark 15 Determining the appropriate model order is a part of the general identification problem. The proposed identification approach requires the user to select the maximum degree of the approximant in the polynomial least-squares problem. However, as the data-dependent orthonormal polynomial approach recursively builds up the orthonormal polynomial basis, it becomes trivial to also obtain all the lower order approximations, see, e.g., [10]. This enables the subsequent determination of the appropriate model order by, e.g., applying a statistical information criterion or by plotting and analyzing the stabilization diagram.

Remark 16
When the computed matrix Q = W Φ is not sufficiently unitary, as determined by some user-defined metric, e.g., if κ(Q) > 1 + ε tol for some tolerance ε tol , then an additional re-orthogonalization step can be performed by explicitly solving the over-determined system of equations (15) using a standard solver, e.g., a QRsolver, instead of using (36)- (37).
A requirement for the implementation of the proposed approach for system identification, is that the obtained solution polynomials are real-valued. How to make sure real-valued polynomials are obtained, is considered next.

Real-valued polynomials
To make sure the obtained polynomials are real-valued, the discrete inner product needs to defined such that all the node-weight pairs are added in conjunction with their complex conjugates. To do this, the weight vector w and node matrix X are extended to include all the complex conjugate nodes and weights as follows, When adding the node-weight pairs as in (95)-(96), then after every second update step of the algorithm in Section 4.4, i.e., when a complex conjugate pair has been added, the (H, 1)-quasiseparable generators are all real apart from possible numerical errors. Therefore after every second update step all imaginary parts of the generators can be set to zero. Even though this approach does not fully utilize the conjugate symmetry inherent in the problem to maximally reduce the computational complexity, it is sufficiently efficient and numerically reliable for the purposes of this paper.
An effective approach to utilize the conjugate symmetry inherent in the problem is the chasing two-down the diagonal approach, as described in, e.g, [39]. In this approach, the node-weight pairs are again added as in (95)-(96) and then the following transformation is performed After this transformation is performed all calculations can be performed on real coefficients, effectively leading to a factor two reduction in computational complexity. However, in this chasing two-down approach, a block of two node-weight pairs, each others complex conjugates, has to be added and chased down the diagonal in conjunction. This leads to a larger non-trivial block on which to perform the calculations since simultaneously adding two bulge elements below the first sub-diagonal of the Hessenberg matrix does increase the rank of the corresponding lower diagonal blocks, which due to rank symmetry will also cause the upper diagonal blocks to become rank 2. Therefore the chasing two-down approach is not used in this paper and its implementation for the quasiseparable Hessenberg matrix case is considered as future work.

Example
In this section the numerical advantages of the proposed approach are shown using a simulation example.

System description
In this section an Active Vibration Isolation System (AVIS) is considered as is shown in Figure 3. The system is modelled as a rigid isolated payload connected to a rigid base frame with four isolator modules as detailed in [51]. The example considered in this paper is based on the SISO transfer function between a moment M x , applied to the payload in the θ x direction and the resulting rotation in the θ y direction. The Bode diagram of this transfer function is shown in Figure 4. This is a relevant case study in the context of the present paper, as the sampling frequency used is often much higher than the dominant dynamics for such a system.

Methods
The true system G 0 (ξ) is given by In this simulation example, the denominator polynomial d 0 (ξ) is estimated from simulated FRF data of the system,G(ξ i ), and it is assumed that the true numerator polynomial n 0 (ξ) is known. This is done to reduce the identification problems problem to a scalar polynomial least-squares problem as is considered in this paper. The vector polynomial case that is encountered when jointly estimating d 0 (ξ) and n 0 (ξ) with optimal numerical conditioning, as in [16], is not considered further here, but follows along the same lines as is explained in Section 7.
The simulated FRF dataG(ω i ) is given as, where ν i is randomly generated, circularly complex normally distributed noise with zero mean and a variance which is constant for all i with a value of 0.1 times the median absolute value of G 0 (ξ i ). In this example 10 3 frequency points, ω i , are considered which are logarithmically spaced between 0.5 and 10 Hz. To identify the denominator polynomial from this data, ten iterations of the SK-algorithm, Algorithm 2, are used wheren(ξ i , θ) is replaced with n 0 (ξ i ) and wherê d(ξ i , θ 0 ) = 1. This identification procedure is performed using three different approaches. First, in the Zdomain using the Schur parametrization of Section 3.5.2 to represent the Hessenberg recurrence matrix, implemented as a rotation-chasing algorithm similar to [52]. Second, in the Z-domain using the presented approach of Section 5 where the computations are performed on the quasiseparable generators that represent the Hessenberg recurrence matrix. Third, in the δ-domain, also using the proposed approach of Section 5.
These three identification approaches are all implemented and performed for increasing sampling frequencies in single precision floating-point arithmetic. To compare the performance of the approaches, two metrics are used, first is the conditioning of the matrix Q = W Φ, which should be equal to 1 when the orthogonalization procedure is successful. To quantify the orthogonalization performance, the geometric mean of κ(Q) − 1 over all iterations is considered, i.e., As a second metric the converged cost function is considered which is determined here as the arithmetic mean of cost function values of the last three iterations, i.e., where, Whenever an invalid number, i.e., ±∞ or NaN, is encountered it is replaced by the maximum conditioning Conditioning results µg{κ(Q) − 1} for different sampling frequencies without re-orthogonalization. Shows that the conditioning numbers for the Z-domain approaches with the Schur parametrization (red dash-dotted) and the quasiseparable parametrization (yellow dotted) increase with increasing sampling frequency and are much higher than those for the δ-domain approach proposed in this paper (blue solid) where the conditioning numbers remain close to optimal κ = 1 for all sampling frequencies.
number for determining µ g {κ(Q) − 1} and for determining V conv the last three valid values for V l are used.

Results
In this section, the results are presented for the AVIS example. First, the results related to the conditioning of the matrix Q = W Φ are presented. Next, the results for the converged cost function values are presented.

Conditioning results
In Figure 5, the conditioning results are shown for different sampling frequencies. These results can be interpreted as a measure for the numerical loss of significance in the orthogonalization procedure. From this figure it is clear that the numerical performance of the δ-domain approach is superior to that of both Z-domain approaches. For the δ-domain approach, the numerical performance is independent of the sampling frequency while for both Z-domain approaches the numerical performance deteriorates with increasing sampling frequency.
Closer inspection reveals that for the Z-domain approach with the Schur parametrization the numerical loss of significance initially scales quadratically with the sampling frequency while for the Z-domain approach with the quasiseparable parametrization this scaling is linear. After this initial scaling, a sharp increase in the numerical loss of significance can be observed once κ(Q)−1 > 1, which can be viewed as a sort of threshold after which orthogonality is lost. The observed difference in initial scaling for the two Z-domain approaches can be explained from the fact that in the quasiseparable parametrization the main diagonal is linearly parametrized as d i , Sampling frequency [Hz] V conv Fig. 6. Converged cost results for different sampling frequencies without re-orthogonalization. Shows that for high sampling frequencies the Z-domain approaches with the Schur parametrization (red dash-dotted) and the quasiseparable parametrization (yellow dotted) are no longer able to reach a low cost function value while the δ-domain approach proposed in this paper (blue solid) reaches the same low cost function value for all sampling frequencies.
while the main diagonal in the Schur parametrization is parametrized quadratically as γ * i−1 γ i .

Converged cost results
The converged cost function values are depicted in figures 6 and 7, which respectively show the results using no re-orthogonalization, as described in Remark 16, and the results where re-orthogonalization is used. In these figures it can be seen that for the lowest sampling frequencies all the considered approaches in Z-domain and δ-domain lead to the same cost function values. This should be the case since, apart from the numerical properties, the considered approaches are equivalent and should yield the same results. For the case without re-orthogonalization, as depicted in Figure 6, the cost function values start to diverge at relatively low values of the sampling frequency. It is observed that the performance of the Zdomain approach using the Schur parametrization starts to deteriorate first around a sampling frequency of 300 Hz. The Z-domain approach using the quasiseparable parametrization starts to show performance degradation from a sampling frequency of 1 kHz. The δ-domain approach proposed in this paper shows no degradation in performance as the sampling frequency increases.
From Figure 7, it is observed that re-orthogonalization can help to increase the range of sampling frequencies for which the performance of both Z-domain approaches does not deteriorate. However, this does not solve the underlying numerical issues, but merely mitigates their effects. For the purpose of obtaining a numerically reliable identification approach, this re-orthogonalization step therefore does not add much. It is clear from these results that the proposed δ-domain approach does solve the underlying numerical challenges that exists for the identification of fast sampled systems in discrete time.  Fig. 7. Converged cost results for different sampling frequencies with re-orthogonalization. Shows that re-orthogonalization can be used to extend the range of sampling frequencies for which the Z-domain approaches with the Schur parametrization (red dash-dotted) and the quasiseparable parametrization (yellow dotted) are able to reach a low cost function value but that these approaches still break down for higher sampling frequencies while the δ-domain approach proposed in this paper (blue solid) reaches the same low cost function value for all sampling frequencies.

Conclusions and outlook
This paper has brought together the aspects of orthonormal polynomials with respect to data-dependent inner products and δ-domain formulations in systems and control, thereby providing a new unified view on the numerically reliable implementation of algorithms in system identification and control. This is applied to develop a unified approach for numerically reliable system identification where identification in Laplace domain, Zdomain and δ-domain are all incorporated as special cases. In Section 6, an example based on a Active Vibration Isolation System is presented, the results of which show that the developed δ-domain approach is numerically superior to the alternative Z-domain approaches.
It is considered ongoing research to extend the presented unified numerically reliable identification approach from the scalar polynomial case to the vector polynomial case for the identification of rational and MIMO system models. This will likely involve extending the scalar quasiseparable generator description to a block-generator description as in, e.g., [53,Chapter 12] and adjusting the chasing operations accordingly to accommodate this block-quasiseparable structure.