Auto-regressive moving-average discrete-time dynamical systems and autocorrelation functions on real-valued Riemannian matrix manifolds

The present research paper proposes an extension of the classical scalar Auto-Regressive Moving-Average (ARMA) model to real-valued Riemannian matrix manifolds. The resulting ARMA model on matrix manifolds is expressed as a non-linear discrete-time dynamical system in state-space form whose state evolves on the tangent bundle associated with the underlying manifold. A number of examples are discussed within the present contribution that aim at illustrating the numerical behavior of the proposed ARMA model. In order to measure the degree of temporal dependency between the state-values of the ARMA model, an extension of the classical autocorrelation function for scalar sequences is suggested on the basis of the geometrical features of the underlying real-valued matrix manifold.


1.
Introduction. The literature on the subject of random matrix generation is rich in ad-hoc solutions. The paper [3] deals with the generation of random orthogonal matrices according to a Haar measure and discusses methods based on the Gram-Schmidt orthogonalization, the eigenvalue decomposition, the singular value decomposition, the Cayley transform that turns a skew-symmetric matrix into an orthogonal matrix, the Householder transformation and the Givens rotations. Direct methods to construct random orthogonal matrices are explained in the paper [21] with an application to randomizing integration methods over sphericallysymmetric integration regions. The problem of generating correlation matrices (a special case of symmetric, positive-semidefinite matrices) is discussed in the paper [11], that makes use of the Bendel-Mickey algorithm to transform a symmetric positive-semidefinite matrix into a correlation matrix. Likewise, the problem of generating random points on the surface of an hypersphere has been long investigated, as illustrated, for example, in the paper [23].
Contributed papers aimed at generalizing the notion of random structured matrix generation. For example, the paper [12] presents a 'subgroup' algorithm for generating uniformly-distributed random matrices on compact groups, with application to matrix generation over the orthogonal group, the unitary group, the symplectic group and the general linear group over a finite field. An account of the problem of generating random compact-group elements was also presented in [30]. It is important to note that not all the manifolds of interest in the literature qualify as algebraic groups. This is the case, for example, of the compact Stiefel manifold of 'tall-skinny' orthogonal matrices or the Graßmann manifold. Sampling on shapespaces is a further example of generation of random elements on a general smooth manifold [28]. For these manifolds, a more general setting than the ones working on algebraic groups is needed.
As to what concerns modeling discrete-time sequences by non-conventional Auto-Regressive Moving-Average (ARMA) discrete-time dynamical systems, models arising from specific applications are known from the scientific literature. The paper [43] suggests the use of an ARMA model on the tangent bundle of 'shape space' with application to matching video shape sequences in human movement analysis. The contribution [22] considers a mixed flat/curved parameter space in a model that explains atmospheric sulfur dioxide pollution incidents. The contribution [31] presented an ARMA predictor for widely linear systems.
The present paper aims at providing a general method for generating random paths on real-valued matrix Riemannian manifolds, on the basis of their differentialgeometric properties. The presented method is thought of to be sufficiently general and sufficiently practical and easy to customize to specific cases of interest. It is based on the generalization of the classical ARMA generative model to matrix Riemannian manifolds. The method for generating random paths on real-valued Riemannian matrix manifolds is presented in two versions: A version is based on parallel translation, while a version is based on a numerically-convenient approximation of parallel translation known as 'vector transport'.
In order to measure quantitatively the statistical features of a matrix sequence generated by the devised ARMA model, an extension of the classical autocorrelation function is suggested. Such notion also allows recovering the standard notions of uncorrelatedness of a sequence on a real-valued matrix Riemannian manifold.
The paper is organized as follows. Section 2 explains the general devised method to accomplish the generation of matrix-valued random paths on real-valued Riemannian manifolds as well as the proposed notion of auto-correlation function on Riemannian matrix manifolds; Section 2 also suggests possible applications of the devised model and mentions a possible extension of the notion of partial autocorrelation function to manifold-valued temporal sequences. Section 3 examines specific cases of matrix manifolds of interest in the scientific literature. Section 4 illustrates the devised theory through several numerical examples. Section 5 concludes the paper.
2. Random paths on matrix Riemannian manifolds and their autocorrelation function. Subsection 2.1 recalls notions of differential geometry that are instrumental in the development of a geometrically-sound method to generate random paths on Riemannian manifolds. Subsections 2.2 explains the devised method and Subsection 2.3 presents the suggested extension of the classical autocorrelation function to matrix Riemannian manifolds. Subsection 2.4 mentions an extension of the notion of partial auto-correlation function. Section 2.5 mentions possible applications of the devised ARMA model. 2.1. Notions of differential geometry. Let M denote a p-dimensional Riemannian manifold, with p finite. Associated with each point x in a p-dimensional differentiable manifold M is a tangent space, denoted by T x M . This is a p-dimensional vector space whose elements can be thought of as equivalence classes of curves passing through the point x.
A Riemannian manifold M x is endowed with an inner product ·, · x : T x M × T x M → R. An inner product is a positive-definite, smooth, symmetric, bilinear map that assigns a real number to pairs of tangent vectors at each tangent space of the manifold. An inner product induces a local norm, namely v x def = v, v x . A fundamental notion in differential geometry is parallel translation that allows, e.g., comparing vectors belonging to different tangent spaces. On a smooth Riemannian manifold M , fix a smooth curve γ : [−a, a] → M (a > 0) and a vector field F ∈ X(M ). The parallel translation map Γ t s (γ) : T γ(s) M → T γ(t) M associated with the curve γ is a linear isomorphism for every s, t ∈ [−a, a]. The parallel translation map depends smoothly on its arguments and is such that The notion of geodesic arc generalizes the notion of straight line of Euclidean spaces. A distinguishing feature of a straight line of an Euclidean space is that it translates parallel to itself, namely, it is self-parallel. The notion of 'straight line' on a curved space inherits such a distinguishing feature: A geodesic arc on a manifold M is a curve γ such thatγ is parallel translated along γ itself, namely, for every s, t ∈ [−a, a], it holds that Γ t s (γ)γ(s) =γ(t). A geodesic arc may be parametrized in terms of two parameters, such as the initial values γ(0) = x ∈ M andγ(0) = v ∈ T x M , in which case the geodesic arc will be denoted as γ x,v (t). A geodesic arc γ x,v : [0, 1] → M is associated with the function exp : T M → M defined as: Given two points x, y ∈ M and a geodesic arc γ : [0, 1] → M that joins them, namely, such that γ(0) = x and γ(1) = y, the parallel translation operator Γ 1 0 (γ) may be simply represented as Γ y For further references on differential geometry, readers may want to consult, e.g., the series of books [36] and the more recent book [9].
An important problem related to the implementation on a computation platform is that parallel translation may be difficult to express in closed form. In such cases, it may be replaced by the notion of vector transport, which appears as an attractive numerical alternative. In order to define the notion of vector transport, the finitedimensional manifold M is supposed to be embedded onto a large linear space E of appropriate dimension 1 whose tangent bundle T E contains the bundle T M . Upon endowing the tangent bundle T E by a Euclidean inner product, it is possible to define an orthogonal projection operator P z : In practice, instead of parallel-translating the tangent vector u ∈ T x M along a geodesic arc emanating from the point x ∈ M in the direction v ∈ T x M as Γ 1 0 (γ x,v )u, vector transport moves rigidly the vector u within T E up to the point y and then orthogonally-projects the vector u over the vector space T y M where y = exp x (v). With a slight abuse of the exact terminology, one may refer to vector transport of a vector u ∈ T x M to a tangent space T y M by P y (u) for some y ∈ M . An example of interest in the present paper concerns finite-dimensional matrix manifolds. On a Riemannian manifold M of real n × p matrices (naturally embedded onto the space R n×p ), endowed with the inner product ·, · , the orthogonal projection operator P Y : R n×p → T Y M at a matrix-point Y ∈ M is defined by the two conditions: It is worth remarking that vector transport should be taken as a computationallyconvenient shortcut that does not enjoy the structural properties of parallel translation. For instance, while parallel translation preserves the angles between translated vectors and the length of tangent vectors (namely, it represents an isometry), vector transport does not preserve such intrinsic quantities, in general (see Subsection 3.1 for a specific example).

2.2.
Generation of a random path on a matrix Riemannian manifold. The fields of differential geometry and of ARMA discrete-time dynamical system theory have touched each other in several research endeavors. A noteworthy example is offered by the paper [1], that studies the geometry of the manifold of a parametric family of ARMA(r, q) systems on R, parametrized by their coefficients, where the geometric structure is induced by the structural properties of the systems, such as invertibility. A similar approach, related to a specific implementation of bounded-input/bounded-output ARMA discrete-time filters, was described in the contribution [16]. In the paper [40], ARMA modeling on the space R p of video sequences is proposed, and it is assumed that the parameters of the ARMA model lay on differentiable manifolds, hence, an inference algorithm is designed to estimate the parameters of the model on a Graßman manifold. More recently, a study of the differential-geometrical structure of the parameter space of linear models of multiple neural spike train sequences was proposed in [35]. The aim of the present contribution is to devise an ARMA model that produces a pseudo-random temporal sequence of matrices that form a path on a matrix-type differentiable manifold. Hence, not only the parameters of the ARMA model are specially structured but the input and the output of the ARMA model are structured matrix sequences that belong to a differentiable manifold M rather than to the space R (or R p ).
An ARMA model on a manifold is described in terms of a pair of sequences (x k , v k ) ∈ T M , where x k denotes the output signal of the model, v k denotes the state of the model and k ∈ Z denotes a discrete-time index. In the conventional case M = R p , choosing the standard identification T x R p ∼ = R p , for any x ∈ R p , an ARMA(r, q) model 2 on the tangent bundle T R p is made of a state-evolution equation and of a state-to-output transform equation. The equation that governs the evolution of the internal state v k ∈ T x k R p reads: where transition matrices A i ∈ R p×p and B i ∈ R p×p denote the parameters of the auto-regressive part and of the moving-average part, respectively, and the sequence w k ∈ T x k R p is a white-noise sequence (also termed random innovation term). The equation that describes how the internal state transforms into the output signal x k ∈ M reads: with c ∈ R p being a constant bias (or drift term). It is worth remarking that, since the tangent bundle T R p is trivial (in fact, T R p is isomorphic to R p × R p ), the equation (3) is based on vector addition. On a general Riemannian manifold M , however, vector addition in the tangent bundle T M (that is, in the AR part and in the MA part) is not defined without the concept of parallel translation. Likewise, equation (4) is based on vector addition on R p , which is not defined on a general (curved) Riemannian manifold without the concept of exponential map.
The proposed extension of the above ARMA model to a Riemannian manifold M is based on the following operations: 1. At each step k, it is necessary to compute a linear combination of several tangent vectors in T x k M via transition operators A i : (x, v) ∈ T M → w ∈ T x M for the auto-regressive part and B i : (x, v) ∈ T M → w ∈ T x M for the moving-average part, where operators A i and B i are linear in the second argument. Note that A i , B i ∈ X(M ), namely, they are vector fields depending on a variable. 2. In order to compute the linear combination of previous and current values of the state variable, at each step k, it is necessary to parallel translate each tangent vector v k−1 , . . ., v k−r , w k−1 , . . ., w k−q to the tangent space T x k M from the tangent spaces that they belong to. 3. In order to propagate the constant bias at each step k, it is necessary to parallel translate the constant term to each tangent space T x k M . On the basis of the above considerations, an ARMA(r, q) model on the tangent bundle T M of the real-valued finite-dimensional matrix manifold M may be written as: for k = 0, 1, 2, . . .. To start the generation of a random path, namely, for k = 0, it is necessary to select the initial seed X 0 ∈ M as well as the constant bias C ∈ T X0 M , and to put into effect an extension of standard zero-padding: Algorithm 1 Pseudocode to implement the ARMA(r, q) random structured matrix process generation rules (5) and (7) on manifold M with K time-steps.
1: Choose a metric for the manifold M and compute the exponential map exp : T M → M and the parallel-transport operator Γ Y X : T X M → T Y M or the vector-transport operator P X , with X, Y ∈ M (see Section 3 for examples on specific manifolds) 2: Choose the initial seed X 0 ∈ M and the initial drift Section 4 for examples on specific manifolds) Generate a random tangent matrix W k ∈ T X k M by a random generator on a linear space (see Section 3 for examples on specific manifolds) 7: Compute the next sample X k+1 = exp X k (V k + D k ) 10: end for with β = max{r, q}. The tangent vectors V k are computed as the output of an autoregressive moving-average model on the tangent bundle T M . In an ARMA(r, q) model with state-variable V k ∈ T X k M and input variable W k ∈ T X k M , the current state variable value V k depends linearly of the past state-variable values V k−1 , . . . , V k−r and of the current and past input-variable values W k , W k−1 , . . . , W k−q . At each step k, it is necessary to generate only a new random matrix W k ∈ T X k M . The additive term Γ X k X0 (C) in the second equation of the system (5) represents a drift term that slightly moves the current state even when the scalar velocity V k X k is close to zero.
The random path generation rule (5), where parallel translation is replaced by vector transport, reads: for k = 0, 1, 2, . . .. In some circumstances, even if a parallel-translation formula is available, vector transport is to be preferred due to easier implementation and lighter computational burden. The steps to implement the ARMA(r, q) random process generation rules (5) and (7) are explained in Algorithm 1.
By adding extra terms to the right-hand side of the first equation of the system (5) or (7), the discussed ARMA model on manifold may be extended to an ARMAX (ARMA with exogenous inputs). Such extension is not pursued in the present paper.

2.
3. An auto-correlation function of a matrix-sequence on a real-valued Riemannian manifold. Autocorrelation refers to the correlation of a time series with its own past and future values [6]. Large positive autocorrelation denounces 'persistence', the tendency of a system to remain in the same state from one observation to the next. The set of autocorrelation coefficients arranged as a function of separation in time (or lag) is the sample autocorrelation function.
In the conventional case of the manifold M = R p (endowed with Euclidean geometry), the sample autocorrelation function R v associated with a sequence v k ∈ R p , k = 0, . . . , K, is defined as: where m and n denote the lags at which the correlation is computed. The constant integers K and L denote, respectively, the number of samples of the sequence v k and the maximum considered lag.
In the case that the manifold M is a matrix Riemannian manifold endowed with the scalar product ·, · , the above definition of sample autocorrelation function may be extended to the sequence (X k , V k ) ∈ T M , k = 0, . . . , K, as: where m, n = 0, . . . , L. The function R V is symmetric in its integer arguments, namely R V (m, n) = R V (n, m), because the inner product is a symmetric operator. The above definition of autocorrelation function for a discrete-time sequence on a tangent bundle T M of a real-valued Riemannian manifold inherits several properties of the autocorrelation functions on the space R p . If the correlation coefficients R V (m, n) depend only on the difference m − n (namely, the matrix with entries R V (m, n) is 'banded'), the sequence (X k , V k ) is termed second-order stationary. In such case, the only independent entries are those on the row R V (m, 0) (or, equivalently, on the column R V (0, n)), hence the correlation function may be denotes as R V (m). The one-parameter autocorrelation function is defined, explicitly, as: where The diagonal entries R V (m, m) of an autocorrelation function possess a particular meaning. Such terms may be written explicitly as: Since parallel translation along a geodesic line is an isometry with respect to its own metrics, it holds that Γ X k X k+m (V k+m ) 2 X k = V k+m 2 X k+m , therefore: Under the hypothesis of stationarity, in particular, all the coefficients R V (m, m) coincide with the one-parameter autocorrelation function at zero lag, namely, with: which coincides with the Riemannian variance of a random distribution on the tangent bundle T M . Clearly R V (0) ≥ 0. By using the Cauchy-Schwarz inequality for inner products and the fact that parallel translation is an isometry, it is for every lag m. The above definition of autocorrelation function may be extended to include the drift term that appears in the second equation of the system (5) as follows: The presence of the drift term in the definition (14) may be shown to cause approximately a lift of the autocorrelation function (10) for small values of the lag m compared to the length K of the sequence. In fact, by the properties of the parallel translation operator Γ · · and by the bilinearity of the inner product ·, · it follows that: In the hypothesis that Γ , it is convenient to define the quantity: For m K, the autocorrelation function (14) may be approximated as: It is worth underlining that, whenever vector transport is used instead of parallel translation in the discrete-time dynamical system that implements the suggested ARMA model on manifolds, the only geometrical notion retained from the general theory of Riemannian manifolds is that of inner product, which is necessary to define the autocorrelation function (10). In this case, the autocorrelation function associated with the sequence (X k , V k ) ∈ T M , k = 0, . . . , K, reads: 2.4. Partial auto-correlation function. In a similar manner, it would be possible to construct a partial autocorrelation function, which is typically used to infer the order of an autoregressive model [6] (interested readers might also want to consult the classical paper by F.L. Ramsey on the characterization of the partial autocorrelation function [34]).
The Partial Auto-Correlation Function is an important tool, often used in association with the Auto-Correlation Function, in the theory of scalar (linear) ARMA models. An important application is the estimation of the nature of a model underlying a given temporal series (namely, AR, MA or mixed) and the estimation of the order of a model, thanks to its prominent cutoff property. The Partial Auto-Correlation Function at a given lag represents the correlation between a time series and its lagged version once the correlation between these and the intermediatelagged sequences has been removed. In other terms, the value of the Partial Auto-Correlation Function at a given lag accounts for the degree of autocorrelation not explained by the values of the the Partial Auto-Correlation Functions corresponding to intervening lags.
The definition of Partial Auto-Correlation Function and its properties are essentially related with the linear structure of the model underlying the data. For non-linear models, the properties of the Partial Auto-Correlation Functions differ (see, for instance, the study [24], which shows that, for non-linear ARMA or VARMA models, it is necessary to introduce other quantities, such as the 'R' function based on mutual entropy, to detect the order of a non-linear model). Since the models (5) and (7) are inherently non-linear because they are formulated on curved spaces, it is unknown, at this stage, whether it will be possible to recover the useful properties of the Partial Auto-Correlation Functions of signals generated/explained by linear models.
2.5. Impact on practical applications. Although the present research topic is still at the seminal stage, some possible applications of the general ARMA(r, q) model on Riemannian manifolds may be suggested on the basis of existing works, published in the scientific literature, that make use of simple models tailored to specific spaces.
An example of possible application is offered by the contribution [38] that deals with robust state estimation algorithms on manifolds. With a growing geometric awareness among state estimation practitioners, geometrically sound algorithms tailored for particular applications are emerging. A common application is the estimation of the orientations of physical objects, which involve the manifold of three-dimensional rotations on the ordinary space, denoted as SO (3). Such application utilizes an ARMA(1, 0) system to model a sequence of points on the manifold SO(3) and employs a particle filtering technique to estimate the state of such model. The application of a random path generator on a manifold might, therefore, be used to produce temporal sequences of data, generated with a known model, to be used in the testing/tuning stage of such state estimation algorithm.
A further example of possible application is offered by the paper [4]. Visual ego-motion estimation refers to the process of estimating the three-dimensional camera pose based on bi-dimensional image sequences captured by a camera. Visual ego-motion estimation plays an important role in computer vision and robotics applications such as visual simultaneous localization and augmented reality. A camera pose can be represented by a matrix belonging to the special Euclidean group SE(3). The state equation utilized in the contribution [4] to represent camera pose during motion is an ARMA(1, 0) model on the manifold SE(3). A possible application of the general ARMA model (5) is to provide a more accurate modeling of a camera pose during motion by making use of an ARMA model with a wider memory span.
A potentially far-reaching application is suggested again by the manuscript [4] that introduces a Particle Swarm Optimization (PSO) method suitable for differentiable manifolds. A close examination of the equations that constitute the PSO algorithm on manifold reveals that such algorithm is formulated as a distributed system of optimization sub-algorithms which locally are ARMA systems insisting on the same manifold. The structural properties of such optimization algorithm, such as its convergence, are likely to emerge from the study of the structural properties of ARMA systems on manifolds. Therefore, an application of the general theory being developed is to study the structural properties of centralized recursive algorithms and distributed recursive sub-algorithms.
The actual realization of the above-mentioned applications still need some further theoretical advancements in the understanding of ARMA models on manifolds, such as the study of structural properties (such as asymptotic stability) and the study of deep relationships between the order and type (AR, MA or mixed) of a model and its auto-correlation/partial autocorrelation functions. Such topics are currently under investigation.
3. Matrix manifolds of interest. The present Section recalls the differentialgeometric features of some matrix manifolds of interest in applications in view of performing numerical simulations in Section 4. In particular, the present Section examines the cases of compact Stiefel manifold and of the group of symmetric positive-definite matrices as well as related manifolds of interest in the scientific literature.
3.1. Compact Stiefel manifold and related manifolds. The Compact Stiefel manifold is defined as: where superscript T denotes matrix transpose and symbol I p denotes a p×p identity matrix 3 . The tangent space to the manifold St(n, p) in a point X ∈ St(n, p) has structure: where symbol 0 p denotes an all-zero p×p matrix. Exemplary applications where the orthogonality constraint plays a prominent role are blind source separation upon signal pre-whitening and independent component analysis [7], non-negative matrix factorization [47], best basis search/selection [2], electronic structures computation within local density approximation [13] and factor analysis in psychometrics [14]. A metric that the Stiefel manifold may be endowed with is the Euclidean metric: where (X, V ) ∈ T St(n, p) and symbol tr(·) denotes matrix trace. The metric defined in (20) may be said to be Euclidean because it does not depend on the point X (namely, it is uniform). However, this does not mean that the space St(n, p) is an Euclidean (flat) manifold. In fact, St(n, p) is a curved Riemannian manifold and parallel translation of tangent vectors (namely, of matrices of the type (19)) along geodesic lines is not trivial. The structure of a geodesic arc in the Euclidean metric reads [13]: where symbol Exp(·) denotes matrix exponential and symbol I r,s denotes a r × s pseudo-identity matrix. The exponential of a matrix X ∈ R p×p is defined by the series: In numerical linear algebra, there is a large body of numerical recipes to compute the exponential of a matrix (see, for example, [25]). The orthogonal projection operator P Y : R n×p → T Y St(n, p), for Y ∈ St(n, p), associated with the Euclidean metrics on the compact Stiefel manifold, reads: with U ∈ R n×p . It can be used to compute vector transport in the equations (7). The compact Stiefel manifold may be endowed with a second kind of metric, termed canonical metric. The associated inner product reads: which, unlike the Euclidean metric (20), is not uniform over the Stiefel manifold. The structure of a geodesic arc in the canonical metric reads [13]: where (X, V ) ∈ T St(n, p) and Q and R denote the factors of the compact QR decomposition of the matrix (I n − XX T )V . The orthogonal projection operator P Y : R n×p → T Y St(n, p), for Y ∈ St(n, p), associated with the canonical metrics on the compact Stiefel manifold: with U ∈ R n×p . For the convenience of the readers, a derivation of the expression (26) is reported in the Appendix A, along with a short discussion on the differences between parallel translation and vector transport based on the projector (26).
The unit hypersphere S n−1 def = x ∈ R n |x T x = 1 coincides with the Stiefel manifold St(n, p) with p = 1. Moreover, on the Stiefel manifold St(n, 1), the Euclidean metric and the canonical metric coincide, namely, they collapse into the uniform inner product: v There is a number of applications that deal with the manifold S n−1 as, for instance, blind deconvolution algorithms [15,17], algorithms for data classification by linear discrimination based on non-gaussianity discovery [33] and algorithms for adaptive pattern recognition [48]. Many important algorithms developed in robotics and related areas require sampling over spheres [45]. For example, the paradigm of sampling-based motion planning has led to algorithms that can solve challenging problems by combining collision detection, search algorithms and sampling strategies over the configuration space. General sampling over spheres arises in many forms of planning and optimization in which some number of directions are locally explored. The norm associated with the inner product (27) is the standard vector 2-norm · and the associated geodesic γ x,v (t), with (x, v) ∈ T S n−1 and t ∈ [0, 1], may be given a closed-form expression, for v = 0, that is: which represents a great circle on the hypersphere, while γ x,0 (t) = x for any value of the parameter t. Given a geodesic arc γ x,v : [0, 1] → S n−1 with (x, v) ∈ T S n−1 and a tangent vector u ∈ T x S n−1 , the expression of the parallel-translation operator that translates the vector u along the geodesic arc γ x,v (t) toward the endpoint t = 1 is known: For a reference, readers might want to consult [19]. Starting from this, some mathematical work leads to the following result: Given two points x, y ∈ S n−1 and a vector u ∈ T x S n−1 , the parallel translation operator Γ y x : T x S n−1 → T y S n−1 has the following structure: provided that x T y = −1 (namely, the points x and y are not antipodal one to another). Note that setting y = x in the equation (31) yields the expected result Γ x x (u) = I n − (I n − xx T )xx T /2 − xx T u = u, for any u ∈ T x S n−1 . Moreover, it is straightforward to show that Γ y x (u) ∈ T y S n−1 because y T Γ y x (u) = 0 for any x, y ∈ S n−1 and u ∈ T x S n−1 .
A manifold related to the hypersphere is the oblique manifold [39], defined as: where diag(·) returns the zero matrix except for the main diagonal that is copied from the main diagonal of its argument. The following identification holds true: hence, each column of a OB(n)-matrix may be treated as a S n−1 -vector. The hyper-ellipsoid L n−1 is defined as: with D ∈ R n×n being diagonal and positive-definite and R ∈ R n×n being such that R T R = I n and det(R) = 1. The hyper-ellipsoid L n−1 is related to the hypersphere S n−1 and its geometry may be studied on the basis of the geometry of the hypersphere. The hyper-ellipsoid L n−1 is used, e.g., in the calibration of magnetometers [42]. A generalization of the Stiefel manifold, recently introduced in the context of principal subspace tracking in presence of a colored noise, has been studied in the contribution [46]. Such generalized Stiefel manifold is defined as: where B denotes any symmetric, positive-definite n × n matrix. The contribution [46] studied the structure of the tangent bundle of the generalized Stiefel manifold and suggests a computationally-convenient retraction map on it (that is, a numerically-convenient approximation of the exponential map [26]).

3.2.
Group of symmetric positive-definite matrices and related manifolds. Symmetric positive-definite matrices find a wide range of applications. For instance, symmetric positive-definite matrices are applied in automatic and intelligent control [8], in computational biology [10], in cognitive computation [18] as well as in psychophysics [27]. The manifold of symmetric positive-definite matrices is defined as: The tangent spaces have structure: namely, every tangent space T X S + (n) coincides with the set of n × n symmetric matrices. The canonical metric adopted for the manifold of symmetric positivedefinite matrices is: whose associated geodesic curve reads: where the symbol √ X denotes a symmetric square root. The above expression requires the evaluation of square root of a symmetric positive-definite matrix which exists surely. By recalling that X −1 Exp(V )X = Exp(X −1 V X), the above expression simplifies into Given a geodesic arc γ X,V : [0, 1] → S + (n) and a tangent vector U ∈ T X S + (n), the expression of the parallel-translation operator that translates the tangent vector U along the geodesic arc γ X,V (t) toward the endpoint t = 1 reads: Some mathematical work leads to the following result: Given two points X, Y ∈ S + (n) and a tangent vector U ∈ T X S + (n), the parallel translation operator Γ Y X : T X S + (n) → T Y S + (n) has the following structure: Note that setting Y = X in equation (42) immediately yields the expected result Γ X X (U ) = U , for any U ∈ T X S + (n). Moreover, it may be verified that Γ Y X (U ) ∈ T Y S + (n) for any X, Y ∈ S + (n) and U ∈ T X S + (n).
A related problem arises about the manifold S + (n, p) of symmetric fixed-rank positive-semidefinite matrices, that may be defined as: The interest in symmetric fixed-rank positive-semidefinite matrices stems from the observation that, with the growing use of low-rank approximations of matrices as a way to retain tractability in large-scale applications, there is a need to extend the calculus of positive-definite matrices to their low-rank counterparts. The manifold of symmetric fixed-rank positive-semidefinite matrices admits the quotient representation: that expresses the fact that any matrix X ∈ S + (n, p) may be represented as X = U R 2 U T , with (U, R 2 ) ∈ St(n, p) × S + (p), up to the equivalence relation (U, R 2 ) ∼ (U P, P T R 2 P ) for any P ∈ O(p). On the basis of the quotient representation (44), paper [29] proposes a Riemannian structure for the space S + (n, p), which is based on a metric that is defined as the linear combination of the Euclidean metric in St(n, p) and the canonical metric in S + (p). The computation of geodesic arcs corresponding to such a metric is still an open question, although geodesic-forms in the structure space St(n, p) × S + (p) may be computed in closed form along with their length (which do not correspond to Riemannian distances, though). 4.1. Generation of random paths on the unit hypersphere. In order to generate a random vector v ∈ T x S n−1 , given a point x ∈ S n−1 , the following scheme may be put into effect: • Generate a random vectorṽ ∈ R n by a random scalar generator; • Project the vectorṽ into the tangent space T x S n−1 by the rule v =ṽ − xṽ T x. The generation of a random path over the manifold S n−1 requires to choose the parameters that take part in the ARMA(r, q) model (5). In particular, a way to select the transition operators A i : (x, v) ∈ T S n → w ∈ T x S n−1 and B i : (x, v) ∈ T S n−1 → w ∈ T x S n−1 is: where matrices A i , B i ∈ R n×n may be chosen arbitrarily. In the present numerical example, it was chosen n = 3, r = 2 and q = 4. Moreover, the implementation based on parallel translation was opted. The resulting random-path generation rule ARMA(2, 4) on the manifold S 2 may be particularized from the general expression (5) as follows: where the exponential map on the hypersphere is implemented as: with x k ∈ S 2 , w k , v k ∈ T x k S 2 for k = 0, 1, 2, . . . and c ∈ T x0 S 2 generated randomly by projection over the tangent space T x0 S 2 . An example of random path generated over the bundle T S 2 by an ARMA(2, 4) model is illustrated in the Figure 1. The initial seed was chosen as the vector A total of K = 5, 000 temporal samples were generated.  Figure 1. Example of random path generated over the sphere S 2 by an ARMA(2, 4) model (5). The initial seed is denoted by a diamond and the generated points are denoted by dots. For clarity, the temporal consequence of two samples is denoted by the geodesic arc that links them.
The Figure 2 illustrates the autocorrelation function R v (m, n) associated with the random path (x k , v k ) ∈ T S 2 . The maximum lag was chosen as L = 10. The autocorrelation function R v (m, n) is clearly symmetric and banded. Such numerical results confirms that, for small lags m, n, the informative part of the autocorrelation function R v (m, n) may be retained as R v (m, 0).
The sample auto-correlation function R v (m) associated with the sequence (x k , v k ) ∈ T S 2 , k = 0, . . . , K, reads: The Figure 3 illustrates the autocorrelation function R v (m) associated with the random path (x k , v k ) ∈ T S 2 and the autocorrelation function R w (m) associated with the random path (x k , w k ) ∈ T S 2 . The maximum lag was chosen as L = 20. The

4.2.
Generation of random paths on the manifold of symmetric, positivedefinite matrices. To generate a random tangent matrix V ∈ T X S + (n), given a point X ∈ S + (n), the following scheme may be put into effect: • Generate a random matrixṼ ∈ R n×n by a random scalar generator; • Project the matrixṼ into the tangent space T X S + (n) by the rule V = 1 2 (Ṽ + V T ).
The generation of a random path over the manifold S + (n) requires to choose the parameters that take part in the ARMA model (5). In particular, a suitable way to select the transition operators A i : (X, V ) ∈ T S + (n) → W ∈ T X S + (n) and B i : (X, V ) ∈ T S + (n) → W ∈ T X S + (n) is: where a i and b i denote real-valued constants chosen arbitrarily. In the present numerical example, it the dimension of the manifold and the depths of the ARMA(r, q) discrete-time system were chosen as n = 2, r = 2 and q = 4. The implementation based on parallel translation was opted. An example of random process generated over the manifold S + (2) by an ARMA (2,4) model is illustrated in the Figure 4. A total of K = 5, 000 temporal samples were generated. The initial seed was chosen as X 0 = I 2 . The sample auto-correlation function associated with the sequence (X k , V k ) ∈ T S + (2), k = 0, . . . , K, reads: The Figure 5 illustrates the autocorrelation function R V (m) associated with the random path (X k , V k ) ∈ T S + (2) and the autocorrelation function R W (m) associated with the random path (X k , W k ) ∈ T S + (2). The maximum lag was chosen as L = 20. The autocorrelation function R V (m) shows the typical decay shape, while the shape of the autocorrelation function R W (m) suggests that the sequence W k is uncorrelated.

4.3.
Generation of random paths on the compact Stiefel manifold. An inspection of the equations that express geodesic arcs and vector transport corresponding to the Euclidean metric and the canonical metric for the Stiefel manifold reported in Subsection 3.1 reveals that the quantities that correspond to the canonical metric are easier to implement on a computational platform. Hence, the canonical metric is the metric of choice to perform numerical examples in the present Subsection.
With the aim to generate a random tangent matrix V ∈ T X St(n, p), given a point X ∈ St(n, p), the following scheme may be taken advantage of: • Generate a random matrixṼ ∈ R n×p by a random scalar generator;  Figure 4. Example of random path generated over the manifold S + (2) by an ARMA(2, 4) model (5). The initial seed is denoted by a diamond and the generated points are denoted by dots. The temporal consequence of two samples is denoted by the geodesic arc that links them.
• Project the matrixṼ into the tangent space T X St(n, p) by the rule V = V − XṼ T X. The generation of a random path over the manifold St(n, p) requires to choose the transition operators A i : (X, V ) ∈ T St(n, p) → W ∈ T X St(n, p) and B i : (X, V ) ∈ T St(n, p) → W ∈ T X St(n, p). A possible way is: where a i and b i are arbitrarily chosen real-valued constants. In the present numerical example, it was chosen n = 5, p = 2, r = 2 and q = 4. The initial seed was chosen randomly in St (5,2) and no constant drift term was included, in this simulation. The random path generation rule (7) particularizes, in this case, to: for k = 0, 1, 2, . . .. The exponential is implemented as: where Q and R denote the factors of the compact QR decomposition of the matrix (I 5 − XX T )V . An example of random path generated over the manifold St(5, 2) by an ARMA(2, 4) model is illustrated in the Figure 6. Again, a total of K = 5, 000 temporal samples were generated. The sample auto-correlation function associated with the sequence (X k , V k ) ∈ T St(5, 2), k = 0, . . . , K, may be expressed as: The Figure 7 illustrates the autocorrelation function R V (m) associated with the random path (X k , V k ) ∈ T St(5, 2) and the autocorrelation function R W (m) associated with the random path (X k , W k ) ∈ T St(5, 2). The maximum lag was chosen as L = 20.

Conclusion.
Although ad hoc methods may be envisaged to generate random matrices of specific forms, it appears that a sufficiently general method to simulate a random path on a matrix manifold was still lacking from the scientific literature. The present paper suggests a unifying framework that shifts the problem of generating a random path on a matrix manifold into the problem of generating a random path on the tangent bundle associated with the manifold by extending the classical notion of (scalar, linear) ARMA model to real-valued Riemannian matrix manifolds. In addition, the present contribution suggests a way to measure the temporal dependency between the samples generated by an ARMA model by extending the notion of sample autocorrelation function to real-valued Riemannian matrix manifolds. The outcome of such research endeavor is illustrated through several numerical example on matrix manifolds of interest in the scientific literature.  The present manuscript covers the state of advancement of the present research topic. Crucial studies to be pursued in the future will be the BIBO (bounded input, bounded output) stability analysis as well as the asymptotic stability analysis of the devised ARMA models on matrix manifolds. Although ad hoc methods to study the stability of specific non-linear ARMA models on the space R p are available in the scientific literature (see, e.g., the method to analyze switched ARMA systems presented in [20]), such methods do not appear immediately extendable to the analysis of ARMA models on manifolds. Future research efforts will be directed along the lines of extending the ARMA model on manifolds to different models currently used, e.g., in econometrics, such as the ARCH model family [5,32]. In addition, current ARMA models on manifolds could be extended to include further non-linear dependencies on the basis of previous research studies on nonlinear ARMA models on flat spaces [44].
• Metric compatibility condition, namely: V, P Y (U ) Y = tr(V T U ) for all V ∈ T Y St(n, p). Such pair of conditions determine univocally the value of the projection P Y (U ). The orthogonal projection (26), in particular, is referred to the canonical metric V, W X def = tr V T I n − 1 2 XX T W , with V, W ∈ T X St(n, p). The compatibility condition with the canonical metric reads: Such equation implies that the matrix P Y (U ) − 1 2 Y Y T P Y (U ) − U be orthogonal to all tangent vectors in T Y St(n, p), namely, it must be a normal vector. It is known (see, e.g., [13]), that the normal space at a point Y of a Stiefel manifold is formed by matrices of the type Y S, with S being p × p symmetric. Therefore, there should exists some symmetric matrix S such that: Pre-multiplying both sides of the latter equation by Y T yields: Transposing both sides of the above equation yields: The tangency condition implies that P T Y (U )Y = −Y T P Y (U ). Summing up, term by term, the equations (58) and (59) and plugging in the tangency condition yields, for the matrix S, the value: Replacing such value of the matrix S within the relationship (58) gives: Since the matrix U − Y U T Y lies on the tangent space T Y St(n, p) (as it may be verified directly), the above equation implies that P Y (U ) = U − Y U T Y . Now, take two points X, Y ∈ St(n, p) and take U ∈ T X St(n, p) ⊂ R n×p . Vector transport of U to the tangent space T Y St(n, p) is defined as P Y (U ). Such vector transport differs from the actual parallel translation. As a first observation, it is immediate to verify that vector transport is not an isometry, namely, it does not preserves the length of the transported vector: Moreover, such vector transport does not preserves angles between transported vectors, namely, given another tangent vector V ∈ T X St(n, p), it can be immediately verified that: where the expression on the left-hand side represents the cosine of the angle between the tangent vectors U, V and the the expression on the right-hand side represents the cosine of the angle between the tangent vectors P Y (U ), P Y (V ).