Covariance-analytic performance criteria, Hardy-Schatten norms and Wick-like ordering of cascaded systems

This paper is concerned with linear stochastic systems whose output is a stationary Gaussian random process related by an integral operator to a standard Wiener process at the input. We consider a performance criterion which involves the trace of an analytic function of the spectral density of the output process. This class of"covariance-analytic"cost functionals includes the usual mean square and risk-sensitive criteria as particular cases. Due to the presence of the"cost-shaping"analytic function, the performance criterion is related to higher-order Hardy-Schatten norms of the system transfer function. These norms have links with the asymptotic properties of cumulants of finite-horizon quadratic functionals of the system output and satisfy variational inequalities pertaining to system robustness to statistically uncertain inputs. In the case of strictly proper finite-dimensional systems, governed in state space by linear stochastic differential equations, we develop a method for recursively computing the Hardy-Schatten norms through a recently proposed technique of rearranging cascaded linear systems, which resembles the Wick ordering of annihilation and creation operators in quantum mechanics. The resulting computational procedure involves a recurrence sequence of solutions to algebraic Lyapunov equations and represents the covariance-analytic cost as the squared $\mathcal{H}_2$-norm of an auxiliary cascaded system. These results are also compared with an alternative approach which uses higher-order derivatives of stabilising solutions of parameter-dependent algebraic Riccati equations.


Introduction
Multivariable linear continuous-time invariant stochastic systems, which can be represented as Toeplitz integral operators acting on an input disturbance in the form of a multidimensional Wiener process or more complicated random processes, provide an important class of tractable models of stochastic dynamics. These models result from linearised description of physical systems such as RLC circuits with thermal noise, vehicle suspension subject to rough terrain, or flexible structures interacting with turbulent flows.
The statistical properties (including the spectral density or pertaining to higher order moments) of the output of a linear system are related to those of the input. In the frequency domain, the input-output characteristics of the system are captured in its transfer function, through which the input and output spectral densities are related to each other. In the case of a standard Wiener process at the input, the output spectral density, as a function of the frequency, quantifies the contribution from different modes to the variance of the output process, which coincides with the squared H 2 -norm of the transfer function. This mean square functional describes the infinitehorizon growth rate for the integral of the squared Euclidean norm of the system output over a bounded time interval. This integral yields a random variable, which depends on the output in a quadratic fashion and can be interpreted as the output energy of the system. In turn, the asymptotic behaviour of fluctuations in the output energy, quantified in terms of its variance, is closely related to the H 4 -norm of the transfer function in an appropriate Hardy space. The H 4 -norm uses the Schatten 4-norm of matrices [9] (involving the fourth power of the singular values), and its discrete-time version appeared in [22,Lemma 2] in the context of anisotropy-based robust control.
The H 2 and H 4 -norms underlie the linear quadro-quartic Gaussian (LQQG) approach [25], which extends LQG control towards minimising not only the first, but also the second-order cumulants of the output energy in the framework of the disturbance attenuation paradigm. These norms are the first two elements in a sequence of the Hardy-Schatten H 2k -norms, k = 1, 2, 3, . . ., in the corresponding spaces [20] of matrix-valued transfer functions, holomorphic in the right half-plane and endowed with the L 2k -norm of their singular values (or, equivalently, the L k norm for the eigenvalues of the output spectral density).
The Hardy-Schatten norms of all orders participate (though implicitly) in the performance criteria of the risk-sensitive and minimum entropy control theories [14]. These theories use the cumulant-generating function of the output energy scaled by a risk sensitivity parameter. The resulting infinite-horizon risk-sensitive performance index is organised as a power series with respect to this parameter, with the coefficients of the series being the growth rates of the output energy cumulants. The cumulant rates are related to the corresponding powers of the Hardy-Schatten norms of the transfer function, which makes the risk-sensitive criterion a specific linear combination of appropriately powered H 2k -norms, with the coefficients being specified by the risk sensitivity parameter.
As suggested in [25], "reassembling" the terms of the risk-sensitive cost with different weights (not necessarily governed by a single parameter) gives rise to a wide class of performance criteria (in the form of linear combinations of powers of the Hardy-Schatten norms) and the corresponding output energy cumulant (OEC) control problems for linear stochastic systems. This class contains the LQG, LQQG and risk-sensitive approaches as special cases which explore this freedom to a certain extent. At the same time, the risk-sensitive criteria play a distinctive role in the robust performance characteristics of the system in the form of the worst-case LQG costs in the presence of statistical uncertainty about the input noise with a Kullback-Leibler relative entropy description [4,16]. It is also relevant to equip the extended OEC approach with its own variational principle for a rational weighting of the Hardy-Schatten norms in regard to appropriately modified robustness properties of the system.
The practical implementation of the OEC approach requires, as its ingredient, the development of state-space methods for computing the Hardy-Schatten norms and related cost functionals, which is the main purpose of the present paper. We are concerned with linear time-invariant stochastic systems, which act as an integral operator on a standard Wiener process at the input and produce a stationary Gaussian random process at the output. Starting from frequency-domain formulations, we propose a performance criterion which involves the trace of an analytic function evaluated at the output spectral density. This class of "covariance-analytic" cost functionals includes the standard mean square and risk-sensitive criteria as particular cases. Due to the presence of the "cost-shaping" analytic function in its definition, this performance criterion admits a series expansion involving all the H 2k -norms of the system. In combination with the Legendre transformation, applied to trace functions of matrices [13,24], the structure of the covariance-analytic cost gives rise to a variational inequality for this functional concerning statistical uncertainty when the system input is a more complicated Gaussian process (with correlated increments unlike the Wiener process).
For finite-dimensional systems, governed in state space by linear stochastic differential equations (SDEs), we develop a method for recursively computing the Hardy-Schatten norms through a recently proposed approach [27] to rearranging cascaded linear systems which arise from powers of a rational spectral density. This "system transposition" technique, oriented to a special spectral factorisation problem, resembles the Wick ordering of noncommuting annihilation and creation operators in quantum mechanics [11,28]. The resulting computational procedure involves a recurrence sequence of solutions to algebraic Lyapunov equations (ALEs) and represents the covariance-analytic cost as the squared H 2 -norm of an auxiliary cascaded system. These results are compared with an alternative approach which uses another set of ALEs for computing the higher-order derivatives of stabilising solutions of parameter-dependent algebraic Riccati equations.
We also mention that a related yet different class of nonquadratic integral performance criteria, involving power series over noncommuting variables, was discussed in a quantum feedback control framework, for example, in [17]. Also, the discretetime counterparts of the Hardy-Schatten norms of all orders play an important role [23] in the anisotropy-constrained versions of the 2 -induced operator norms of systems.
The paper is organised as follows. Section 2 specifies the class of linear stochastic systems and covariance-analytic costs under consideration. Section 3 describes the Hardy-Schatten norms and discusses their links with the growth rates of the output energy cumulants. Section 4 provides a variational inequality for the covarianceanalytic functional and related robustness properties. Section 5 specifies a class of strictly proper finite-dimensional systems in state space. Section 6 obtains a spectral factorisation for a particular cascade of systems. Section 7 applies this factorisation to the state-space computation of the Hardy-Schatten norms and covarianceanalytic costs. Section 8 discusses an alternative approach to their computation using a parameter-dependent Riccati equation. Section 9 illustrates both procedures for computing the Hardy-Schatten norms by a numerical example. Section 10 provides concluding remarks.

Linear stochastic systems and covariance-analytic costs
Let Z := (Z(t)) t∈R be an R p -valued stationary Gaussian random process [10] produced as the output of a linear causal time-invariant system from an R m -valued standard Wiener process W := (W (t)) t∈R at the input 1 : The impulse response f : R + → R p×m to the incremented input (with R + := [0, +∞)) is assumed to be square integrable, f ∈ L 2 (R + , R p×m ), thus securing the existence of the Ito integral in (1), along with the isometry for the variance of the process Z. Here, E(·) is expectation, and M F := M, M F is the Frobenius norm [9] generated by the inner product M, N F := Tr(M * N) of real or complex matrices, with (·) * := ((·)) T the complex conjugate transpose. We will also write M Q := √ QM F = Tr(M * QM) for the weighted Frobenius (semi-) norm associated with a positive (semi-) definite Hermitian matrix Q. The process Z in (1) has zero mean and a continuous covariance function K ∈ C(R, R p×p ), which specifies the two-point covariance matrix and its particular case, the one-point covariance matrix whose trace is given by (2). Here, S : R → H + p (with H + p the set of positive semidefinite matrices in the subspace H p of complex Hermitian matrices of order p) is the spectral density of the process Z: It is expressed in terms of the Fourier transform of the impulse response of the system given by which is the boundary value of the transfer function holomorphic in the open right half-plane {s ∈ C : Res > 0}. Since the impulse response f is square integrable, (7) where the Plancherel identity is combined with (2)-(6), and F(ω) 2 F = TrS(ω) is related to the output spectral density (5).
If the transfer function F pertains to a plant-controller interconnection, and the process Z consists of the dynamic variables whose small values are a preferred behaviour of the closed-loop system, the corresponding performance criteria usually employ cost functionals to be minimised over the controller parameters. Such functionals include the mean square cost F 2 2 from (8) and the quadratic-exponential functional used in the risk-sensitive and minimum entropy control theories [14]. Here, for a finite time horizon T > 0, the random variable can be interpreted as an "output energy" of the system over the time interval [0, T ], and θ > 0 is a risk sensitivity parameter constrained by with the H ∞ -norm of the transfer function F. Also, · is the operator matrix norm, and λ max (·) is the largest eigenvalue of a matrix with a real spectrum. The corresponding "energy rate" along with the mean square convergence The latter holds in view of the relations from [25, Lemma 1], provided the output spectral density S in (5) is square integrable (and hence, so also is the covariance function K in (3)), in which case the transfer function F of the system has a finite H 4 -norm 2 The integrand S(ω) 2 F = Tr(( F(ω) F(ω) * ) 2 ) is the fourth power of the Schatten 4-norm [9, p. 441] of the matrix F(ω); see also [20]. In (13), use is made of the Plancherel identity, which is applied to (5) in combination with (3) and the invariance of the Frobenius norm under the matrix transpose and complex conjugation, whereby K(t) F = K(−t) F for any t ∈ R. The transfer functions F with F 4 < +∞ form a normed space H p×m 4 . The quadratic cost F 2 2 , its "quartic" counterpart F 4 4 , and the risk-sensitive cost (9) are particular cases of a "covariance-analytic" functional specified by a function ϕ of a complex variable, which is analytic in a neighbourhood of the interval [0, F 2 ∞ ] in the complex plane, takes real values on this interval, satisfies ϕ(0) = 0 (15) and is evaluated [7] at the matrix (5). In the case of absolutely integrable impulse responses f , the condition (15) is necessary for the convergence of the integral in (14) since, in that case, lim ω→∞ S(ω) = 0 in view of the Riemann-Lebesgue Lemma applied to (6), so that lim ω→∞ ϕ(S(ω)) = ϕ(0)I p . For example, the risk-sensitive cost (9) corresponds to (14) with the function which satisfies (15) and is analytic in the open disc {z ∈ C : |z| < 1 θ } containing the interval [0, F 2 ∞ ] in view of (11). This correspondence follows from the identity ln det M = Tr ln M for nonsingular matrices, applied to the integrand in (9). Alternatively, J ϕ θ (F) can be evaluated by applying (14) to a rescaled transfer function as where ϕ 1 is given by (16) with θ = 1. For a wider class of spectral densities S with a finite (and not necessarily zero) limit S(∞) := lim ω→∞ S(ω), a necessary condition for convergence of the integral in (14), similar to (15), is provided by It is also possible to incorporate a frequency-dependent weight u ∈ L 1 (R, R + ) in (14) by considering the quantity which extends such functionals to transfer functions F ∈ H p×m ∞ with a bounded spectral density S, not necessarily vanishing at infinity, and makes the condition (17) redundant. However, for simplicity, we leave this extension aside and will only discuss the costs (14).

Hardy-Schatten norms and output energy cumulants
In accordance with (16), the Taylor series expansion of the risk-sensitive cost (9) can be represented as so that (with N the set of positive integers). Here, use is made of the Hardy-Schatten norms of the transfer function F defined in terms of the output spectral density S from (5) by on the corresponding spaces H p×m 2k . The integrand in (20) is the 2kth power of the Schatten 2k-norm [9, p. 441] 2k Tr(S(ω) k ) = 2k Tr(( F(ω) F(ω) * ) k ) of the matrix F(ω). The H 2 and H 4 -norms of the system in (8), (13) are particular cases of (20) with k = 1, 2. The representation (18) can also be viewed as a series over the powers F 2k 2k of the Hardy-Schatten norms, with the coefficients 1 2k θ k specified by a single risk sensitivity parameter θ .
Returning from the risk-sensitive example to the general case, the Taylor series expansion of the function ϕ in (14) allows the covariance-analytic cost to be represented as an infinite 3 linear combination of the appropriately powered Hardy-Schatten norms (20), with the coefficients ϕ k ∈ R specified by ϕ as a "cost-shaping" function.
We will now discuss several properties of the H 2k -norms in regard to their probabilistic origin from the output energy (10) of the system. In what follows, the trivial case of identically zero transfer functions F (with F ∞ = 0) is excluded from consideration.

Lemma 1
The Hardy-Schatten norms (20) are related to the H 2 -and H ∞ -norms in (8), (12) by the inequality Proof. At any frequency ω ∈ R, the output spectral density (5) is a positive semidefinite Hermitian matrix, which satisfies S(ω) F 2 ∞ I p almost everywhere in view of (12), and hence, for any k ∈ N. By integrating both sides of (23) over ω and using (20), it follows that which, by induction on k, leads to By (22), the finiteness of the H 2 and H ∞ -norms ensures that all the Hardy-Schatten norms (20) are also finite, and hence, Another corollary of (22) is that Similarly to (9), these norms govern the infinite-horizon asymptotic behaviour of the cumulants of the output energy E T in (10) as discussed below. These cumulants are related to the corresponding moments as through k-variate polynomials (with real coefficients π k; 1 ,..., k , independent of the probability distribution of E T ), which are homogeneous in the sense that as illustrated by the first three polynomials: where Π 1 , Π 2 yield the mean and variance, respectively. The following theorem, whose results were presented in a slightly different form in [25] and a quantummechanical version was obtained in [26,Theorem 4], is closely related to the Szegő limit theorems for the spectra of Toeplitz operators [6] and is given here for completeness.
Theorem 1 Suppose the system (1) has finite H 2 -and H ∞ -norms (8), (12), so that its transfer function (7) satisfies Then the cumulants (25) of the output energy (10) have the following asymptotic growth rates in terms of the Hardy-Schatten norms (20).
Proof. In view of the inclusion (24), the fulfillment of the condition (28) also ensures that Now, for a fixed but otherwise arbitrary T > 0, consider a compact positive semidefinite self-adjoint operator K T with the continuous covariance kernel (3) acting on a function ψ ∈ L 2 ([0, T ], C p ) as In view of the Fredholm determinant formula [20, Theorem 3.10 on p. 36] (see also [5] and references therein), ln Ee for any where λ j,T 0 are the eigenvalues of the operator K T , with r(K T ) = max j∈N λ j,T its spectral radius, and I is the identity operator on L 2 ([0, T ], C p ). Note that (11) implies (33) in view of the upper bound [6] max j∈N λ j, for the induced operator norm of K T on the Hilbert space due to the Toeplitz structure of the covariance kernel [2,18] and in accordance with (8), thus securing the convergence of the series in (32). Moreover, for any k ∈ N, the trace of the k-fold iterate of the operator K T in (31) is given by a convolution-like integral whose asymptotic behaviour is described by in view of (5), (20), (30) and [26, Lemma 6 in Appendix C]. On the other hand, the left-hand side of (32) is the cumulant-generating function of E T evaluated at θ 2 and, therefore, related to the cumulants (25) as ln Ee whereby (the latter is obtained by comparing the right-hand sides of (32), (37)). A combination of (36) with (38) establishes (29).
The relation (29) clarifies the probabilistic meaning of the Hardy-Schatten norms of the transfer function in the context of the asymptotic behaviour of the output energy cumulants. As shown below, the covariance-analytic cost (14) plays a similar role for the following functional of the covariance operator (31): where the last equality uses (38). Such "trace-analytic" functionals arise in quantum mechanics [13] and are also used in a control theoretic context (in application to matrices rather than integral operators); see, for example, [21] and [24, Section X]. An equivalent representation of (39) in terms of the resolvent of the operator K T is given by where the integral is over any counterclockwise oriented closed contour in the analyticity domain of ϕ around the interval [0, F 2 ∞ ] containing the spectrum of K T . In view of (26), (27), the quantity (39) is a complicated function of the output energy moments: Theorem 2 Suppose the assumptions of Theorem 1 are satisfied. Then for any function ϕ, analytic in a neighbourhood of [0, F 2 ∞ ] and satisfying (15), the growth rate of the functional (39) coincides with (14): Proof. Since the function ϕ is analytic in a neighbourhood of [0, F 2 ∞ ], then the radius of convergence of its Taylor series at the origin is strictly greater than F 2 ∞ , and hence, Similarly to (23), the inequality (34) and the positive semi-definiteness of the operator K T lead to Tr(K k+1 T ) = Tr(K k/2 for any k ∈ N. Therefore, by induction on k, combined with (35), it follows from (42) that (43) A combination of (36), (41), (43) along with a dominated convergence argument (using the upper bound |ϕ k Tr as T → +∞, thus establishing (40) in view of (21).

A variational inequality with covariance-analytic functionals
Suppose the input W of the system (67) is a Gaussian process in R m with stationary increments (not necessarily independent as in the standard Wiener process case), whose covariance structure is specified by where D is a positive semi-definite self-adjoint covariance operator on the Hilbert space L 2 (R, R m ). The operator D is shift-invariant in the sense of the commutativity DT τ = T τ D with the translation operator T τ acting on a function ψ : R → R m as (T τ ψ)(t) = ψ(t + τ) for any t, τ ∈ R. Also, suppose the covariance operator D has a spectral density Σ : R → H + m which relates (44) to the Fourier transforms g, h of the functions g, h as For what follows, it is assumed that the input spectral density Σ admits the factorisation where Γ ∈ H m×m ∞ is the transfer function of a shaping filter which produces W from a standard Wiener process V := (V (t)) t∈R in R m according to an appropriately modified form of (1); see has the identity covariance operator D = I with the constant spectral density Σ (ω) = I m , in which case the shaping filter is an all-pass system, with the matrix Γ (iω) being unitary for almost all ω ∈ R. Returning to the more general case in (44), (45), the output spectral density (5) is modified as thus allowing the variance of the stationary output process Z in (1) to be represented in this case as where Λ : R → H + m is an auxiliary function associated with the Fourier transform (6) by Up to |p − m| zero eigenvalues, the matrix Λ (ω) is isospectral to F(ω) F(ω) * on the right-hand side of (5), so that the H 2k -norms in (20) can be expressed in terms of (48) as In particular, application of the Cauchy-Bunyakovsky-Schwarz inequality to the right-hand side of (47) leads to an upper bound for the root mean square value of the output process: where is the H 4 -norm of the transfer function Γ of the input shaping filter, in accordance with (13), (46). The inequality (49) also yields an upper bound on the induced norm of the system as a linear operator acting from H m×m which illustrates the role of the H 4 -norms for the operator theoretic properties of linear systems.
We will now discuss a variational inequality involving the covariance-analytic functionals (14). To this end, consider the Legendre transformation [19] of a traceanalytic function on Hermitian matrices whose eigenvalues are contained by an interval ∆ := (a, b) (with possibly infinite endpoints a < b): where ψ is an analytic function in a neighbourhood of ∆ in the complex plane with real values on this interval. The following lemma establishes invariance of a class of such functions under the Legendre transformation.
Lemma 2 Suppose the function ψ, described in the context of (50), has a positive second derivative: Then the Legendre transformation of Trψ yields a trace-analytic function, representable as for any matrix L ∈ H m with eigenvalues in the interval ψ (∆ ) = (ψ (a+), ψ (b−)) whose endpoints are the corresponding one-sided limits of the first derivative ψ . Here, is the Legendre transformation of ψ as a function of a real variable, with v : ψ (∆ ) → ∆ the functional inverse of ψ : v := (ψ ) −1 .
The condition (51) implies that ψ is strictly increasing on ∆ and, therefore, has a functional inverse v : By substituting the right-hand side of (59) for M, the Legendre transformation in (50) is represented as which establishes (52). The functions ψ : ∆ → R and ψ * : ψ (∆ ) → R in Lemma 2 are strictly convex and form a convex conjugate pair (related by the Legendre transformation for functions of a real variable). Now, suppose the input spectral density Σ in (46) is known imprecisely, and the statistical uncertainty is described by Here, ψ is a given function, which takes nonnegative values on the positive half-line (0, +∞) and is analytic in its neighbourhood in the complex plane. The corresponding covariance-analytic functional J ψ is evaluated at the transfer function Γ of the input shaping filter as in accordance with (14), (46), for which it is assumed that detΓ (iω) = 0 (that is, Σ (ω) 0) at almost all frequencies ω ∈ R. While ψ reflects the general structure of the uncertainty class G ψ,d in (60), its "size" is measured by a parameter d 0. The following theorem discusses the worst-case output variance in the framework of this uncertainty description.
Theorem 3 Suppose the statistical uncertainty about the input W is described by (60), where the function ψ, with nonnegative values on ∆ := (0, +∞) and analytic in its neighbourhood in the complex plane, satisfies (51). Then for any d 0, the worst-case value of the output variance for the system F in (47) admits a guaranteed upper bound where ψ * is the convex conjugate of ψ, given by (53) which is extended to other Hermitian matrices L, M by letting its right-hand side equal +∞. Multiplication of L by a scalar σ > 0 and division of both sides of (62) by σ yields Since the left-hand side of (63) does not depend on σ , this inequality can be tightened as Application of (62) to the matrices L := Λ (ω) and M := Σ (ω) in (48), (46), integration over the frequencies ω ∈ R in accordance with (47) and the scaling argument as in (64) lead to Here, the covariance-analytic functionals J ψ , J ψ * , associated with the function ψ and its convex conjugate ψ * in (53), are evaluated at the transfer functions Γ and √ σ F of the input shaping filter and the rescaled system, respectively; see Fig. 1. Since the right-hand side of (65) depends on J ψ (Γ ) in a monotonic fashion, and J ψ (Γ ) d for any Γ ∈ G ψ,d in the uncertainty class (60), then which establishes (61). For any given σ > 0, the right-hand side of (61) depends monotonically on the quantity where the function ϕ(z) := ψ * (σ z) is related to ψ * in (53). In a robust control setting, where the transfer function F pertains to the closed-loop system, this monotonicity justifies the minimisation of the covariance-analytic cost (14) as a way to guarantee an improved bound on the worstcase output variance in (61). This approach is similar to minimax LQG control [16], except that here the relative entropy description of uncertainty is replaced with (60), while the risk-sensitive cost (9) is replaced with (14). The uncertainty description (60) and the corresponding cost (14), specified by (66), employ the covariance-analytic functionals J ψ , J ψ * . Their link through the Legendre transformation (53) resembles (and, in some respects, generalises) the duality relation [3] which underlies the role of the risk-sensitive control for robustness properties of systems [4] in minimax LQG settings.
Although (60) is concerned only with the input spectral densities irrespective of entropy constructs, we note that the function ϕ θ in (16), associated with the risksensitive cost Ξ (θ ) in (9), leads to in (66), considered with the scaling θ = 2σ for convenience. This function ψ * is the convex conjugate of which is a strictly convex nonnegative function (admitting an analytic continuation to a neighbourhood of (0, +∞) in the complex plane), with The corresponding uncertainty class G ψ,d in (60) takes into account the deviation of the input spectral densities Σ in (46) from the identity matrix I m , which is the only element of G ψ,0 .

Strictly proper finite-dimensional systems
We will now proceed to the state-space computation of the covariance-analytic functionals (14) (which, in essence, reduces to that of the Hardy-Schatten norms (20)) for finite-dimensional systems. More precisely, consider a class of systems (1) with an R n -valued stationary Gaussian state process X := (X(t)) t∈R governed by an Ito SDE as dX = AXdt + BdW, where A ∈ R n×n , B ∈ R n×m , C ∈ R p×n are given matrices, with A Hurwitz, thus making the impulse response square integrable. This system has a strictly proper real-rational transfer function (7): where is an auxiliary C n×n -valued transfer function from the increments of the process BW to the internal state X of the system in (67). The state-space realisations of F, E are denoted by so that the corresponding system conjugates are represented as Since the matrix A is Hurwitz, the transfer function F in (69) satisfies (28) along with its corollary (30), whereby it has finite Hardy-Schatten norms F 2k of any order in (20).
In the finite-dimensional case being considered, the spectral density S and its powers in (20) are rational functions which can be identified with transfer functions of such systems. The state-space realisations of the systems F, E and their conjugates F ∼ , E ∼ in (71)-(74) allow the spectral density S in (5) (with a slight abuse of notation, as a function of iω rather than the frequency ω itself) to be realised as the transfer function of the system where In view of the identity I n 0 α I n I n 0 β I n = I n 0 α + β I n , α, β ∈ C n×n , the dynamics matrix of the state-space realisation (75) is block diagonalised by the similarity transformation which employs the covariance matrix of the stationary state process X in (67), related by to the transfer function E in (70) and the matrix from (76), and satisfying the ALE so that P coincides with the controllability Gramian of the pair (A, B). The righthand side of (78) involves a linear operator L A on C n×n , associated with the Hurwitz matrix A as Due to the commutativity between L A and the matrix transpose (L A (V T ) = L A (V ) T for any V ∈ R n×n ), the subspace S n of real symmetric matrices of order n is invariant under L A : Also, since A is Hurwitz, the inclusion (81) is complemented by the invariance of the set S + n of real positive semi-definite symmetric matrices of order n under the operator L A : Therefore, in view of (77), the system (75) admits an equivalent state-space realisation: Fig. 2 An infinite cascade of identical linear systems with the common state-space realization of S in (75), (83) and an external input υ. The sum of their outputs, weighted by the coefficients ϕ k of the Taylor series expansion of the analytic function ϕ from (14) in accordance with (21), forms the output ζ of the system ϕ(S).

Infinite cascade spectral factorization
With the spectral density S in (5) being represented as the transfer function of a finitedimensional system with the strictly proper state-space realisation (75) (or (83)), the matrix ϕ(S) in (14) corresponds to the transfer function for an infinite cascade of such systems shown in Fig. 2. The resulting system has an infinite dimensional state and the strictly proper state-space realisation where a, b, c are the state-space matrices of S in (83). In accordance with its cascade structure, the system (84) has a block two-diagonal lower triangular dynamics matrix. As established below, this system admits an inner-outer factorization (see, for example, [29]) with a similar infinite cascade structure. This result is based on the following two lemmas and a theorem, which adapt a "system transposition" technique from [27].
Lemma 3 Suppose the matrix A ∈ R n×n is Hurwitz, and U = U T ∈ R n×n is a positive semi-definite matrix such that the pair (A, √ U) is controllable. Then the transfer function E in (70) and its system conjugate E ∼ in (73) satisfy for any s ∈ C beyond the spectra of ±A, where V is the unique solution of the ALE Proof. The relation (85) is a corollary of [27, Lemma 2]. The condition U 0 and the controllability of (A, √ U) ensure that the solution of (86) satisfies V 0 (and is, therefore, nonsingular).
In application of Lemma 3 to the matrix U := from (76), the ALE (86) coincides with (79) and yields the covariance matrix V := P in (78). Assuming that the pair (A, B) is controllable (and hence, P 0), the relation (85) allows the factors E and E ∼ in (75) to be rearranged as thus leading to so that the factor E is moved to the right, while its dual E ∼ is moved to the left. This rearrangement resembles the Wick ordering for mixed products of noncommuting annihilation and creation operators in quantum mechanics (see [28] and [11, pp. 209-210]). Theorem 4 below uses (87) in order to extend (88) to arbitrary positive integer powers of S. Its formulation employs three sequences of matrices α j , β j , γ j ∈ R n×n computed recursively as (where δ jk is the Kronecket delta, and the operator L A is given by (80)), with the initial conditions Here, use is made of an auxiliary matrix so that Ω δ j1 = Ω if j = 1 I n otherwise . It is convenient to extend (89) to j = 0 as α 1 = γ 0 β 0 by letting in accordance with the first equality from (92). In order for the recurrence equations (89), (90) to be valid for all j, it is assumed that the matrices γ j in (91), (92) are nonsingular: In particular, letting j := 1 in (91) yields in view of (92), and, since PΩ P = PC T (PC T ) T due to (93), the condition det γ 1 = 0 for the matrix (96) is equivalent to the controllability of (A, PC T ). The following lemma provides relevant properties of these matrices which will be used in the proof of Theorem 4.
Proof. Similar to the proof of [27, Theorem 1], a nested induction will be used over k ∈ N (which numbers the outer layer of induction) and j = 1, . . . , k − 1 (which numbers the inner layer, becoming inactive at k = 1). The validity of (103) at k = 1, that is, with the matrices α 1 , β 1 given by (92), follows from (88) in view of the symmetry and positive definiteness of the covariance matrix P in (78) due to the controllability of (A, B). In order to demonstrate the structure of the subsequent induction steps, consider the left-hand side of (103) at k = 2: where use is made of (89)-(91) for j = 1 along with (92), (93), (96), (104). The last two equalities in (105) involve repeated application of (85) from Lemma 3. This allows the rightmost E ∼ factor to be "pulled" through the product of the E factors (and constant matrices between them) until the E ∼ factor is to the left of all the E factors. The last equality in (105) also uses the symmetry of the matrices β j , γ j in (97) of Lemma 4, whereby Therefore, (105) establishes (103) for k = 2. Now, suppose the representation (103) is already proved for some k 2. Then the next power of the matrix S takes the form where the rightmost E ∼ factor is the only E ∼ factor which is to the right of the E factors. We will now use the pulling procedure, demonstrated in (105), and prove that by induction over r = 2, . . . , k. The fulfillment of (108) for r = 2 is verified by applying (87) and using (92), (96): Now, suppose (108) is already proved for some r = 2, . . . , k − 1. Then its validity for the next value r + 1 is established by using (85) of Lemma 3 in combination with (89)-(91) as Therefore, (108) holds for any r = 2, . . . , k, thus completing the inner layer of induction. In particular, at r = k, this relation takes the form where (85) of Lemma 3 and (89)-(91) are used again along with (106). Now, substitution of (109) into (107) leads to which completes the outer layer of induction, thus proving (103) for any k ∈ N.
For any σ 1 , σ 2 , σ 3 , . . . ∈ R \ {0}, the factorisation (103) is invariant under the transformation which involves the square root representation of the matrices β k in (101) and can be used for balancing the state-space realisations (to improve their numerical computation).

Hardy-Schatten norms and covariance-analytic functionals in state space
In view of the system conjugacy the relation (103) can be represented as in terms of the strictly proper stable systems which are assembled into provided the condition (102) is satisfied. The system G has infinite-dimensional real state-space matrices (its output matrix is an infinite block-diagonal matrix involving the square roots from (101)), an R p -valued input and an R ∞ -valued output. This system is organised as an infinite cascade of systems shown in Fig. 3. The significance of the systems G k in (111) for computing the norms F 2k is clarified by the following theorem. Its formulation employs matrices P jk ∈ R n×n satisfying a recurrence equation which uses the operator (80) and the matrices (89), (93) along with the initial condition P 11 := γ 1 (114) given by (96).
Proof. Integration of the transfer functions on both sides of (110) over the imaginary axis and a comparison with (20), (8) (with the H 2 -norm being evaluated at G k ) yield for any k ∈ N, which establishes the first equality in (115). Now, the H 2 -norm of the system G k can be computed as in terms of the matrices β k , ρ k from (90), (101) and the blocks P jk = P T k j ∈ R n×n of the controllability Gramian of (A N , B N ). Here, A N ∈ R nN×nN , B N ∈ R nN×p , C N ∈ R (m+p(N−1))×nN are the statespace matrices of the system obtained by truncating the system G in (112), so that with A N inheriting the Hurwitz property from A. In (118), use is made of the property that the matrix P N is a submatrix of P N+1 for any N ∈ N due to the cascade structure of the system G (see Fig. 3). Also, (117) makes advantage of the block-diagonal structure of the matrix C N in (120), whereby the kth diagonal block of the matrix C N P N C T N reduces to (C N P N C T N ) kk = ρ T k P kk ρ k . The block lower triangular structure of A N and the sparsity of B N in (120) allow the ( j, k)th block of the ALE in (118) to be represented as where the convention P 0k = 0, P k0 = 0 is used along with (92), (93). It follows from (121) that P jk = L A (α j P j−1,k + P j,k−1 α T k + δ j1 δ k1 PΩ P).
In combination with (116), the representation (21) of the covariance-analytic functional (14) leads to thus extending the state-space calculations from monomials to analytic functions of the spectral density S. In a particular yet practically important case, where all the derivatives of ϕ in (21) are nonnegative which holds, for example, in the risk-sensitive setting 4 (16), the functional (123) is related by to the H 2 -norms of the systems whose state-space realisation is obtained from (112). The representation (125) of the covariance-analytic cost for the finite-dimensional system F in (69) in terms of the LQG cost of an auxiliary system H, associated with F by (126), can be regarded as a system theoretic counterpart of the sum-of-squares (SOS) structures from polynomial optimization [12,15]. The practical computation of the series (123) (or (125) under the condition (124)) can be carried out by truncating it to the first N terms, with N being moderately large in the case of fast decaying coefficients ϕ k . As discussed in the proof of Theorem 5, in addition to solving N ALEs of order n for finding the auxiliary matrices α 1 , . . . , α N , β 1 , . . . , β N in (89)-(92), this involves one ALE of order nN (or 1 2 N(N + 1) ALEs of order n in (113), (114)) for computing the controllability Gramian P N in (118) for the system G N in (119).

An alternative computation with differentiating a Riccati equation
For comparison with the results of Section 7, we will now discuss a different approach to the state-space computation of the covariance-analytic functional and related Hardy-Schatten norms. To this end, the following lemma rephrases [14, Proposition 6.3.1, pp. 66-68] (see also [1]) in application to the risk-sensitive cost (9) and is given here for completeness.
Lemma 5 Suppose the risk sensitivity parameter θ > 0 satisfies (11) with the stable transfer function (69). Then the risk-sensitive cost (9) for the stationary Gaussian process Z in (67) can be computed as Here, Ψ (θ ) ∈ S n is the unique stabilising solution of the algebraic Riccati equation (ARE) in the sense that the matrix is Hurwitz, with the matrices , Ω given by (76), (93).
Since the matrix A is Hurwitz, the stabilising solution Ψ (θ ) of the θ -dependent ARE (128) satisfies Also, Ψ (θ ) is infinitely differentiable in θ over the interval (11). In view of (19), the Hardy-Schatten norms can be recovered from the state-space representation (127) of the risk-sensitive cost as By substituting (131) into (21), the covariance-analytic functional in (14) takes the form This alternative way for computing the cost J ϕ (F) and the H 2k -norms in state space involves the derivatives Ψ (k) of the stabilising solution of (128), which can be recursively found as follows.
Theorem 6 Under the conditions of Lemma 5, the derivatives Ψ k of the stabilising solution Ψ of the ARE (128) in (131) satisfy a bilinear recurrence equation , where the operator (80) is used along with the matrices , Ω from (76), (93) and the binomial coefficients.
Proof. Since the matrices A, , Ω are constant, the differentiation of both sides of the ARE (128) in θ yields and hence, where the operator (80) is used along with the Hurwitz matrix ϒ from (129), and the argument θ of Ψ (θ ), ϒ (θ ) is omitted for brevity. Now, extending (134), suppose it is already proved for some k ∈ N that for any θ satisfying (11) (note that in the case of k = 1, the sum vanishes: ∑ 0 j=1 = 0). Since Θ k in (135) and ϒ in (129) inherit smoothness from Ψ , with ϒ = Ψ , then by differentiating the ALE ϒ T Ψ (k) +Ψ (k) ϒ +Θ k = 0 with respect to θ , it follows that Substitution of the matrix Θ k from (135) into the second equality in (136) leads to where the last equality uses the Pascal triangle identity for the binomial coefficients.
Together with the first equality in (136), the relation (137) provides the induction step which proves (135) for any k ∈ N. The recurrence equation (133) can now be obtained by considering (135) at θ = 0 and using the property which follows from (129), (130) and leads to Ω k = Θ k (0). It now remains to note that, according to (133), for any k > 1, the matrix Ψ k depends on Ψ 1 , . . . ,Ψ k−1 in a bilinear fashion.
Since the matrix Ω 1 in (133) with k = 1 reduces to Ω in (93), the matrix Ψ 1 takes the form and, in view of (80), coincides with the observability Gramian Q of the pair (A,C) satisfying the ALE which is dual to the ALE (79). Accordingly, the dual form of the relations (127), (128) of Lemma 5 and (133) of Theorem 6 is obtained by replacing the matrix triple (A, B,C) with (A T ,C T , B T ) and is given by where Φ(θ ) ∈ S n is the unique stabilising (with A + Φ(θ )Ω Hurwitz) solution of the ARE AΦ(θ ) + Φ(θ )A T + θ + Φ(θ )Ω Φ(θ ) = 0, whose derivatives satisfy the recurrence equation and give rise to the corresponding dual representations for the Hardy-Schatten norms in (131), and the covariance analytic functional in (132): By a similar reasoning, the matrix 1 in (138) at k = 1 reduces to in (76), so that the matrix Φ 1 takes the form and, in view of (78), coincides with the controllability Gramian P of the pair (A, B). Therefore, the role of the matrices Φ k , Ψ k in computing the higher-order H 2k -norm F 2k (with k > 1) is similar to that of the Gramians P = Φ 1 , Q = Ψ 1 for the mean square cost F 2 2 = Ω , P F = , Q F . For example, in accordance with (131), (139), the H 4 -norm is related to Φ 2 , Ψ 2 as (cf. [25, p. 2386]), so that 1 2 Φ 2 , 1 2 Ψ 2 are the controllability and observability Gramians for a subsidiary system A PC T B T Q 0 , with 1 2 Φ 2 = γ 1 in view of (96). By analogy with the Gramians P, Q, the matrices Φ k , Ψ k are referred to as the controllability and observability Schattenians [25] of order k for the triple (A, B,C), respectively.
In combination with (131), Theorem 6 (or the dual version of (133) in (138)) allows the first N Hardy-Schatten norms F 2 , F 4 , . . . , F 2N to be computed by solving N ALEs of order n, which is simpler than the approach of Section 7 based on the adaptation of the system transposition technique in Section 6. However, the latter approach contributes a special class of spectral factorizations, which overcomes the noncommutativity of transfer matrices in cascaded linear systems and is of theoretical interest in its own right.

A numerical example of computing the Hardy-Schatten norms
As an illustration, consider a system (71) with input, state and output dimensions m = 2, n = 4, p = 3 and the state-space realization , whose matrices A, B, C are generated randomly, with A Hurwitz (its spectrum is {−0.5409 ± 1.2631i, −1.9875, −0.6748}). Such a model can represent a mechanical system with two degrees of freedom, resulting from an internally stable interconnection of a plant and a controller, each organised as a mass-spring-damper system with one degree of freedom and subject to an external random force. The results of the state-space computation of the first ten Hardy-Schatten norms for this system by using Theorem 5 are shown in Fig. 4. In Fig. 5, these results are compared (in terms of a relative gap) with the alternative method of computing the norms through the combination of (131) and Theorem 6. This comparison provides an experimental cross-verification of both approaches.

Conclusion
For linear stochastic systems with a stationary Gaussian output process, we have considered a class of covariance-analytic performance criteria, which involve the trace of Fig. 4 The first ten Hardy-Schatten norms F 2k , with k = 1, . . . , 10, for the model system computed using Theorem 5.

Fig. 5
The relative gap (on the logarithmic scale) between the norms F 2k , with k = 1, . . . , 10, of the model system computed in two ways: using Theorem 5 and the alternative approach of (131) together with Theorem 6. a cost-shaping function of the output spectral density and contain the standard mean square and risk-sensitive costs as particular cases. With such a cost being closely related to the Hardy-Schatten norms of the system, we have discussed their links with the asymptotic growth rates of the output energy cumulants. We have also considered the worst-case mean square values of the system output in the case of statistical uncertainty about the input noise using covariance-analytic functionals with convex conjugate pairs of cost-shaping functions. For a class of strictly proper finite-dimensional systems, with state-space dynamics governed by linear SDEs, we have developed a method for recursively computing the Hardy-Schatten norms through a modification of a recently proposed technique of rearranging cascaded linear systems, reminiscent of the Wick ordering of quantum mechanical annihilation and creation operators. This computational procedure involves a recurrence sequence of solutions to ALEs and represents the covariance-analytic functional as the squared H 2 -norm of an auxiliary cascaded system. These results have been compared with a different approach using higher-order derivatives of stabilising solutions of a parameter-dependent ARE and illustrated by a numerical example.