Multivariate prediction and matrix Szeg\"o theory

Following the recent survey by the same author of Szeg\"o's theorem and orthogonal polynomials on the unit circle (OPUC) in the scalar case, we survey the corresponding multivariate prediction theory and matrix OPUC (MOPUC).

Prediction theory for discrete-time stationary stochastic processes, a topic in time series, has been long studied and has benefited greatly from the recent work on orthogonal polynomials on the unit circle (OPUC); see e.g. Simon's books [Si1], [Si2], [Si3] for background and references. Here the partial autocorrelation function (PACF) plays an important role, as it provides an unrestricted parametrization of the relevant spectral measure µ; see e.g. the papers by Inoue [In1], [In2], [In3], Inoue and Kasahara [InKa1], [InKa2] and the survey by Bingham [Bi] for background and references. The PACF is essentially the sequence of Verblunsky coefficients, and the unrestricted parametrization is Verblunsky's theorem, in the language of OPUC; see [Si1], Ch. 1,2.
While this is very satisfactory for univariate time series, one often encounters multivariate time series, particularly in areas such as mathematical finance, where the dimensionality ℓ is the number of risky assets held in a portfolio; ℓ reflects the need to diversify one's portfolio (in the style of Markowitz), and may be large (see e.g. [BiFrKi] for a case in point). Multivariate time series form an important area (see e.g. the books by Hannan [Ha] and Reinsel [Re]). Likewise, multivariate prediction goes back to work by Wiener and Masani [WiMa1], [WiMa2], Helson and Lowdenslager [HeLo] of 1957-61 (also Masani [Mas1] - [Mas5]). Just as in the univariate case with OPUC, in the multivariate case the matrix theory of OPUC -MOPUC for short below -is crucially relevant. Following the great stimulus to OPUC provided by Simon's books, MOPUC has recently been extensively developed. Our purpose here is to survey these developments with a view to multivariate prediction theory, providing a matrix sequel to [Bi].

The Kolmogorov Isomorphism Theorem
Stone's theorem ( [Sto], [RiNa] §137, extended to semigroups in [HiPh] XXII) tells us that a group U = (U t ) of unitary transformations has a spectral representation of the form where E(.) is a projection-valued random measure. For a stationary stochastic process X = (X t ) (here time t is discrete and X t is a random complex ℓ-vector), write U for the shift t → t + 1. Since we can write X t = U t X 0 , this gives us a spectral representation for X = (X t ): . Multiplying these expressions for X s , X s+t and using stationarity, we get on taking expectations the spectral representation of the correlation matrix: where µ is the spectral measure. This is Herglotz's theorem; see e.g. [Ro2], 19-20. §2. Verblunsky's theorem In the univariate case, the Verblunsky coefficients a = (a n ) ∞ n=1 satisfy |a n | < 1, and the bijection a ↔ µ of Verblunsky's theorem provides the unrestricted parametrization so useful in statistics and prediction theory; see e.g. Pourahmadi [Pou1], [Pou2], [Bi] §2 for background here.
In the ℓ-dimensional case, where one uses MOPUC rather than OPUC, the spectral measure µ is now ℓ × ℓ matrix-valued, and corresponds to the ℓ × ℓ covariance matrix by Herglotz's theorem. The Verblunsky coefficients have been studied by Damanik, Pushnitski and Simon [DaPuSi], §3. They show (3.10) that the Verblunsky coefficients are now ℓ × ℓ matrices on the unit circle, satisfying a n < 1, that any sequence a of such matrices can arise in this way, and that the map a ↔ µ is again a bijection (Verblunsky's theorem for MOPUC - [DaPuSi], They showed that the a n have singular values α n with |α n | ≤ 1. The Szegö recursion that leads to OPUC is known in the time-series literature as the Levinson-Durbin algorithm. The Levinson-Durbin algorithm was extended to the multivariate case by Whittle [Wh] in 1963, and by Wiggins and Robinson [WiRo] in 1965. In [MoViKa], the authors use the results of [Wh] and [WiRo], which they call the LWR algorithm, and give a normalized version of it. They show that unless the Levinson-Durbin algorithm stops (corresponding to some correlation matrix failing to be positive-definite), |α n | < 1 for all n. §3. Szegö's theorem Derevyagin, Holtz, Khrushchev and Tyaglov [DeHoKhTy] continue this study, again using Bernstein-Szegö approximation ( § §5,6). They show ( §7) that (using † for the adjoint (conjugate transpose), µ ′ for the density of (the absolutely continuous component of) µ, denoted w below) log Π ∞ n=1 det(1 − a † n a n ) = tr log µ ′ dθ/2π = tr log wdθ/2π for any non-trivial (i.e. of infinite support) matrix-valued probability measure on the unit circle. Call those for which the integral on the right is > −∞ Szegö measures, and the finiteness of the integral Szegö's condition, (Sz). They deduce that µ is a Szegö measure iff a † n a n < ∞ ("a ∈ L 2 "). This is Szegö's theorem for MOPUC. They continue ( §8) with the matrix version of the Helson-Lowdenslager theorem: where the infimum is over all matrices A of determinant 1 and all trigonometric polynomials P (e iθ ) = k>0 A k e ikθ . The Wold decomposition extends from the scalar to the vector case. See e.g. Masani [Ma1], [Mas4], Hannan [Ha], and in the context of operator theory, Sz.-Nagy and Foias [SzNF], Ch. 1, Nikolskii [Nik] Ch. 1. As in the scalar case, (Sz) is the condition that the deterministic component in the Wold decomposition should vanish, leaving only the moving-average component. This is the condition of non-determinism, (ND) ("(Sz) = (ND)").
As in the scalar case, it is often useful to strengthen the Szegö condition by requiring explicitly that the singular component µ s of the spectral measure µ should vanish. This is called the condition of pure non-determinism, (P ND):

Matrix spectral factorizations and matrix Szegö functions
Factorizations are already present in the scalar case. For an analytic function in the Hardy space on the disc, identify the boundary values of the function on the unit circle with the function itself, as usual; then the spectral density w and the Szegö function h are related by Here h is in the Hardy space H 2 , and is an outer function (see e.g. [Bi] for references and details); one can think of h as the 'analytic square root' of w.
In the matrix case, one speaks of the spectral factorization problem. The two main traditional approaches to multivariate prediction theory are those of Helson and Lowdenslager [HeLo] [Wh]); (v) convergence of the finite-past predictor to the infinite-past predictor ( §13) -cf. Baxter's inequality, §6 below).
To these, we also add (vi) Whittle's multivariate extension of the Levinson-Durbin algorithm, mentioned in §2. See also [Mas4] for extensive commentary on Wiener's work in this area.
In the matrix case, one needs to discriminate between the full-rank and degenerate-rank cases -where the rank of the spectral density matrix W is full (ℓ) or degenerate (m < ℓ). The degenerate-rank case is considered in [Ro2], [Mas3], § §11,12, [WiMas3] for ℓ = 2, Matveev [Mat] in the general case; we refer there for details. The generic, and easier, case is the fullrank case, where Γ is positive-definite (the contrast between the full-and degenerate-rank cases is similar to that arising in, e.g., regression, where one encounters multi-collinearity; see e.g. [BiFr], §7.4. This interesting work dates from the 1960s, before the work of Fefferman and Stein on BMO and of Sarason on VMO. Armed with these, Peller [Pel1] in 1990 considered matrix spectral factorizations The simplest, and principal, case is that of a purely non-deterministic process of full rank. See e.g. several papers by Ephremidze, Janashia and-Lagvilada [EpJaLa], [EpLa], [JaLaEp]. In particular, a (matrix) function in a Hardy space is outer iff its (scalar) determinant is outer.k, More general than the matrix-valued case is the operator-valued case. This is important in operator theory, in non-commutative probability theory, non-commutative Hardy-space theory, The strong Szegö theorem, as presented in e.g. [Si1] Ch. 6, [Bi] §5, extends in full to the matrix case. For a short proof, see Böttcher [Bo1]; cf. [Bo2], [Bo3], Basor and Widom [BasWi]. A different approach has been given more recently by Chanzy [Cha1], [Cha2]. §6. Baxter's inequality and Baxter's theorem

Baxter used OPUC in a series of probabilistic papers of 1961-63 ([Bax1]
- [Bax3]). His results concern, among other things, the weak and strong forms of Szegö's limit theorem (for Toeplitz determinants), finite and infinite Wiener-Hopf equations (in discrete time n = 0, 1, 2, . . .: finite with n k=0 , infinite with ∞ k=0 ), and the convergence of finite-predictor coefficients (in which one is given a finite section of the past of length n) to the corresponding infinite-predictor coefficients. This last depends on Baxter's inequality [Bax3]. Baxter's inequality was used by Simon [Si1], Ch. 5, in his proof that the Verblunsky coefficients a ∈ ℓ 1 iff the correlation function γ ∈ ℓ 1 , the spectral measure µ is absolutely continuous, and its density µ ′ = w is continuous and positive. Simon calls this result Baxter's theorem (though Baxter did not formulate the result in this form, and 'the Baxter-Simon theorem' might be better here, but we will follow [Si1]). Perhaps because of this rather involved history, and the fact that Simon's book [Si1] is still comparatively recent, there is as yet no matrix extension to Baxter's theorem. We raise here the question of obtaining one, and turn now to what is known.
Baxter's inequality and convergence of finite predictors in the matrix case were considered by Masani in 1966 ([Ma3] §13) and by Cheng and Pourahmadi [ChPo] in 1993. Theoretical progress in the area since then has been extensive, and the question arises of weakening the conditions that they impose. For more recent developments in the scalar case, see [InKa2].
Results on approximation by such finite-section operators have been given in great generality by Seidel and Silbermann [SeSi] (see §2.5.4), using Banachalgebra techniques (as did Baxter and Simon).

§7. Nehari sequences and the Levinson-McKean condition
Nehari's theorem of 1957 states that a Hankel operator is a bounded map from ℓ 2 on the natural numbers to itself iff the sequence generating it is the sequence of negative Fourier coefficients of a bounded function. See e.g. [Si1] Th. 6.2.17 ("The modern theory of Hankel operators started with the following result of Nehari"), Peller [Pel3]. Finding such a generating sequence is thus a type of moment problem, and as with other moment problems there may be no solution, a unique solution or more than one solution; the moment problem is then called insoluble, determinate or indeterminate. The indeterminate case is particularly important; the generating sequence is then called a Nehari sequence. This Nehari moment (or interpolation) problem was considered by Adamjan, Arov and Krein [AdArKr] in 1968; they described the solution set in terms of Sarason's concept of rigidity [Sa1]. Rigidity has previously been studied in this area in connection with the concept of complete non-determinism (CND); see Bloomfield, Jewell and Hayashi [BlJeHa]. It turns out that (CND) is equivalent to the intersection of past and future property (IPF) [IK2]. It has very recently been shown that both are equivalent to the Levinson-McKean property (the name comes from work of Levinson and McKean, [LeMcK] p. 105, in continuous time; the name is given by analogy in the discrete-time case). Phase functions ( §4) play a crucial role here; see Kasahara and Bingham [KaBi] for details.
The question arises of matrix extensions of these results (work in progress). The matrix Nehari problem has been considered in detail by Arov  Related to this Nehari problem is the Schur (interpolation) problem, the matrix case of which is considered at book length in Dubovoj, Fritzsche and Kirstein [DuFrKi]; cf.
[ArDy], §7.6. §8. Pure minimality In the scalar case, pure minimality is characterized by (µ s = 0 and) Kolmogorov's condition 1/w ∈ L 1 , using w for the density of the spectral measure µ (now absolutely continuous). This result extends to the multivariate case; see Makagon and Weron [MaWe], [Pou3,Th. 8.10]. The spectral density is now a matrix W , and its inverse W −1 is integrable. It is thus natural that the condition W −1 ∈ L 1 should be imposed in studying processes subject to stronger regularity conditions than pure minimality. We turn below to two such conditions -positive angle ( §9) and complete regularity ( §10). We regard the four conditions in Treil and Volberg [TrVo1] show that the following matrix Muckenhoupt condition is necessary and sufficient for the positive-angle condition (P A) in the multivariate case: where the supremum is taken over all intervals I of the unit circle. Here the condition that W be invertible a.e. (which corresponds to the 'pure' in pure minimality, see §8 above) has to be imposed explicitly, as noted by Peller in his review of [TrVo2] (MR1428818 (99k;42073)). As shown in [HeSz], [HeSa], the condition (P A) (and so (A 2 ) by above) is equivalent to a condition on the sequence ρ(n) of regularity coefficients of the form ρ(.) < 1.
We note that in his review of earlier work on this problem by Makagon, Miamee and Schröder [MaMiSc], Pourahmadi says "Attempts to obtain a similar result for q-variate stationary sequences have been unyielding" (MR1443841 (98g:60074)).

Complete regularity
We turn now to a strengthening of the conditions of §9 above. The process is said to be completely regular if ρ(n) → 0 as n → ∞; see [IbRo], Ch. 4, 5. It was shown by Treil and Volberg [TV2] that complete regularity is equivalent to the following strengthening of the Muckenhoupt condition (A 2 ): lim sup |I|→0 1 |I| I W 1/2 1 |I| I W −1 1/2 < ∞ (cf. Peller [Pel1], [Pel3]); here as before W −1 ∈ L 1 needs to be assumed explicitly. Note the form ("ρ(.) → 0, lim sup ... = 1") of the strengthenings here of the conditions ("ρ(.) < 1, sup ... < ∞") of §7 above. §11. Hankel operators Prediction theory has always involved Toeplitz operators (as in the book [GrSz] by Grenander and Szegö), and Toeplitz and Hankel operators have many links in operator theory. So it is natural that Hankel operators are useful in prediction theory. A monograph treatment of Hankel operators is given by Peller ([Pe3]; see also the review by Dym cited above in §7). Connections of Hankel operators with the matrix Muckenhoupt condition and with the matricial Nehari problem are considered by Arov and Dym in [ArDy3], Ch. 10, 11. §12. Open questions We mention two. Question 1. Find the matrix version of Baxter's theorem.
As mentioned in §6, the matrix version of Baxter's inequality provides a good starting-point. Question 2. Find the matrix version of [KaBi].
This hinges on solution of the matrix Nehari problem -the step Γ → H.
We hope to return to this elsewhere.