Multivariate and 2D Extensions of Singular Spectrum Analysis with the Rssa Package

Implementation of multivariate and 2D extensions of Singular Spectrum Analysis (SSA) by means of the R-package Rssa is considered. The extensions include MSSA for simultaneous analysis and forecasting of several time series and 2D-SSA for analysis of digital images. A new extension of 2D-SSA analysis called Shaped 2D-SSA is introduced for analysis of images of arbitrary shape, not necessary rectangular. It is shown that implementation of Shaped 2D-SSA can serve as a base for implementation of MSSA and other generalizations. Efficient implementation of operations with Hankel and Hankel-block-Hankel matrices through FFT is suggested. Examples with code fragments in R, which explain the methodology and demonstrate the proper use of Rssa, are presented.


Introduction
Singular Spectrum Analysis (SSA) as a method of time series analysis has well-elaborated theory and solves various problems: time series decomposition, trend extraction, periodicity detection and extraction, signal extraction, denoising, filtering, forecasting, missing data imputation, change point detection, spectral analysis among them (see examples and references in Vautard and Ghil (1989); Golyandina et al. (2001); Ghil et al. (2002); Golyandina and Zhigljavsky (2013)). Since the method does not need a model given a priori, it is called nonparametric and is well suited for exploratory analysis of time series.
Additionally, SSA allows the construction of the model during or after exploratory analysis. The underlying parametric model of the signal is the sum of products of polynomial, exponential and sine-wave functions. This is a linear model in the following sense: such series constitute the class of solutions of linear differential equations. In the case of discrete time, such time series satisfy linear recurrent relations (LRRs). There is a class of so called subspacebased methods (Van Der Veen et al. 1993), which are related to estimation of parameters in the mentioned parametric model, in particular, to estimation of frequencies.
Although some problems like frequency estimation do need the model, some problems like arXiv:1309.5050v1 [stat.ME] 19 Sep 2013 smoothing do not need a model at all. For forecasting in SSA, the time series may satisfy the model approximately and locally, since the forecasting is based on estimation of the signal subspace and not on estimation of the parameters of the model. Depending on the quality of approximation of the series by the model, long or short horizon forecasts can be constructed.
The aforementioned possibilities are implemented in different software based on different methodologies of SSA, see Golyandina and Korobeynikov (2013) for references. The sources of the methodology used in this paper are the books Golyandina et al. (2001); Golyandina and Zhigljavsky (2013) and the software from Gistat Group (2013), where SSA analysis and forecasting of one-dimensional and multivariate time series are implemented in an interactive manner. The same methodology is used and developed in the Rssa package; see Golyandina and Korobeynikov (2013), where analysis, forecasting and parameter estimation by means of Rssa for one-dimensional series are described.
At the present time, Rssa is extensively developed and includes SSA-processing of systems of series (Multivariate or Multichannel SSA, shortly MSSA) and of digital images (2D-SSA). The aim of this paper is to describe this multidimensional part of the Rssa package as of version 0.10. Note that the effective implementation of the algorithms of multidimensional SSA extensions is very important, since the algorithms are much more time-consuming than in the one-dimensional case. Therefore, we pay attention to both the guidelines for the proper use of the package and the efficient implementation techniques, which are based on the methods described in Korobeynikov (2010). In addition, a new method called Shaped 2D-SSA (ShSSA for short) is introduced. This method can be applied to the images of non-rectangular form, e.g., circle images, images with gaps and so on. Also, it is shown that the implementation of Shaped 2D-SSA can serve as a common base for implementation of many SSA extensions.
Let us introduce the general scheme of an SSA-like algorithm including MSSA and 2D-SSA. The SSA-like algorithms decompose the given data X into a sum of different components: A typical SSA decomposition of a time series is the decomposition into slowly-varying trend, seasonal components and residual or the decomposition into some pattern, regular oscillations and noise. The input data can be a time series, a multivariate time series, a digital image.
The algorithm is divided into four steps. The first step is the generation of some multivariate object on the base of the initial object (time series or image) by moving a window of some form and taking the elements from the window. For one-dimensional time series, this window is an interval that produces subseries of the time series of length L, where L is called window length (in SSA). For multivariate time series (a system of s one-dimensional series), the window also produces subseries of length L, but we apply this window to all time series of the system (in MSSA). For images, the window can be a 2D rectangle (in 2D-SSA) or some other 2D shape (in ShSSA). Then all the subobjects (subseries or vectorized 2D shapes) obtained by applying the window are stacked into a matrix as columns. Let us call this matrix trajectory. The trajectory matrix has a specific structure: it is a Hankel, stacked Hankel, Hankel-block-Hankel or quasi-Hankel matrix.
The second step consists in decomposition of the trajectory matrix into a sum of elementary matrices of rank 1. The most frequently used decomposition, which has a lot of optimal approximation properties, is the Singular Value Decomposition (SVD).
The third step is the grouping of the decomposition components. At the grouping step, the elementary rank-one matrices are grouped and summed within groups.
The last forth step converts the grouped matrix decomposition back to the decomposition of the initial time series or image decomposition. The elements of each component of the grouped decomposition are obtained by averaging the entries of the grouped matrices that correspond to the same element in the initial object.
Thus, the result of the algorithm is the decomposition (1) of the initial objects into the sum of objects. We assume that the initial object is a sum of some identifiable components, for example, trend and seasonality or signal and noise and that we observe only the sum. Then the aim of the SSA-like methods is to reconstruct these components. The possibility to reconstruct the object components is called separability of these components.
Complex SSA is the same as the Basic SSA but it is applied to complex-valued one-dimensional time series with complex-valued SVD and therefore fits well into the described scheme. Certainly, Multivariate Complex SSA and 2D Complex SSA can be considered in similar manner, but it is out the scope of this paper. Note that Complex SSA can be applied to a system of two real-valued time series considered as real and imaginary parts of a complex series.
We mentioned that the model of the series can be constructed during the SSA processing. The simplified (possible polynomial modulation is omitted) model of the signal has the form s n = r k=1 A k µ n k , where µ k = ρ k e i2πω k . The general approach for estimation of the modulations ρ k and frequencies ω k in this model is to use the so called signal subspace, which can be estimated at the third step of the SSA algorithm. One of the subspace-based methods is ESPRIT (Roy and Kailath 1989). This approach has an 2D extension named 2D-ESPRIT for estimation of parameters in the model (S) ln = r k=1 A k µ l k ν n k .
Finally we dwell on the issues related to efficient implementation. The two main points that underlie the implementation are (1) the use of fast methods for the Singular Value Decomposition and (2) the use of fast multiplication of Hankel-related matrices and matrix hankelization by means of the Fast Fourier Transform (FFT), see Korobeynikov (2010). However, the usage of the algorithms possessing good theoretical complexity will not automatically yield the efficient implementation per se. The package also implements many practical considerations which make the core SSA steps really fast. Besides this, many of the functions in the package employ on-demand calculation of the necessary objects and memoization. These points are very important for large window sizes and long time series, and are crucial for 2D-SSA.
What is more, the package supports several input data types, tries not to constraint the user and carefully preserves attributes of input data and has various useful plotting capabilities.
On the theoretical/algorithmic side, this paper has several new contributions. First, we describe a fast algorithm of the vector SSA forecasting and provide its implementation. Second, we introduce a new extension of 2D-SSA, Shaped 2D-SSA, which should extend the application area of the method in digital image processing. Then, fast FFT-based algorithms are described and implemented for each considered SSA-like method. Finally, we show that all the considered extensions of SSA (and several related extensions) are special cases of Shaped 2D-SSA.
The structure of the paper is as follows. Section 2 contains the general description of the SSA approach to show common character of the considered extensions and introduce the necessary notions. Multivariate version of SSA for analysis and forecasting of systems of series is described in Section 3. 2D-SSA for image processing together with 2D-ESPRIT for parameter estimation are considered in Section 4. Section 5 devoted to a new shaped 2D version of SSA for analysis of images of non-rectangular form.
Each of Sections 3-5 contains the implemented algorithms, description of the Rssa functionality with a typical R code, simulated and real-life examples with the corresponding fragments of R code accompanied by guidelines for the proper use. Since implementation of multivariate extensions is based on that of Shaped 2D-SSA, the description of implementations in Section 6 is provided in the inverse order; for example, MSSA is considered as a particular case of Shaped 2D-SSA.
Section 6 contains details of the used fast methods for matrix multiplication and hankelization and also the fast algorithm of the vector forecasting. Theoretical details of the MSSA and 2D-SSA algorithms are put into Appendix (Section 7).
This paper contains a comprehensive description of multivariate extensions of SSA. Although it is impossible to consider all applications of multivariate extensions, we provide examples for the most common ones, with code fragments and references to the literature. We also demonstrate plotting capabilities of the package which are necessary for the proper use of the SSA methodology and representation of the results.

Common structure of algorithms and general notions
Let us describe the common features of the algorithms of the SSA extensions and the corresponding notions. Before introduction of the algorithm details, we present a general scheme of the SSA-like algorithms in Fig. 1.
Tabl. 2 contains the list of extensions considered in the paper with some details.
Further we discuss the steps of the algorithms of these extensions in detail.

Details of the algorithms
Step 1. Embedding. The first step consists in the construction of a so called trajectory matrix X = T (X) by means of a map T .
For example, in the Basic SSA algorithm for analysis of an one-dimensional time series X = (x 1 , . . . , x N ), T maps R N to the space of Hankel matrices L × K with equal values on antidiagonals, where N is the series length, L is the window length and K = N − L + 1: Let us describe the scheme of this step in a more formal manner. Denote M p,q the set of p × q matrices. Let an ordered set of real values X be the initial data, i.e., series or image, and M be the linear space consisting of possible X. Let T be a one-to-one mapping from M to M (H) L,K consists of matrices of some Hankel-like structure, L and K are determined by the method parameters. The mapping T is called embedding and put elements of X on some places of X. Therefore, for each element x of X the mapping T (X) = X = {(X) ij } determines the set of indices A such that (X) ij = x for any (i, j) ∈ A. Formally, let E i ∈ M be the object with 1 on the i-th entry and other entries are zeros. Then For example, in the Basic SSA algorithm applied to an one-dimensional time series X, M = R N , M (H) L,K is the set of Hankel matrices with equal values on anti-diagonals A, T is given in (2). as X = X 1 + . . . + X d . (3) The triple ( λ j , U j , V j ) consisting of the singular value λ j , the left singular vector U j and the right singular vector V j of X is called j-th eigentriple.
Then the resultant matrix X I corresponding to the group I is defined as Thus, we have the grouped matrix decomposition For example, grouping can be performed on the base of the form of eigenvectors, which reflects the properties of initial data components.
The grouping with I j = {j} is called elementary.
Step 4. Reconstruction. At this final step each matrix of the decomposition (4) is transferred back to the form of the input object X. It is performed optimally in the following sense: for a matrix Y we seek for the object Y ∈ M that provides the minimum to Note that by the embedding nature of T the projection Π (H) is the averaging of the entries along the sets A. For example, in SSA the composite mapping T −1 • Π (H) is the averaging along anti-diagonals, that is, The same averaging can be written down in a general form: F . Thus, denote X k = X I k the reconstructed matrices, X k = Π (H) X k the trajectory matrices of the reconstructed data and X k = T −1 ( X k ) the reconstructed data themselves. Then the resultant decomposition of the initial data has the form If the grouping is elementary, then m = d and the reconstructed objects in (5) are called elementary components.

General notions
Separability Very important notion in SSA is separability. Let X = X 1 + X 2 . (Approximate) separability means that there exist such grouping that the reconstructed series X 1 is (approximately) equal to X 1 . Properties of the SVD yields the (approximate) orthogonality of columns and of rows of trajectory matrices X 1 and X 2 of X 1 and X 2 as the separability condition. There is well-elaborated theory of separability of one-dimensional time series (Golyandina et al. 2001, Sections 1.5 and 6.1). It appears that many important decomposition problems, from noise reduction and smoothing to trend, periodicity or signal extraction, can be solved by SSA. Certainly, the embedding operator T determines separability conditions. It could be said that the success of SSA in separability is determined by Hankel structure of the trajectory matrix and optimality features of the SVD.
Information for grouping The theory of SSA provides the ways how to detect the SVD components related to the series component to perform proper grouping in conditions of separability. One of the rules is that the eigenvector produced by a data component repeats the properties of this component. For example, in SSA the eigenvectors produced by slowlyvarying series components are slowly-varying, the eigenvectors produced by a sine wave are sine waves with the same frequencies; and so on. These properties help to perform the grouping by visual inspection of eigenvectors and also by some automatic procedures (see Alexandrov (2009) and (Golyandina and Zhigljavsky 2013, Section 2.4.5)).
To check separability of the reconstructed components X 1 and X 2 , we should check the orthogonality of their reconstructed trajectory matrices X 1 and X 2 . The convenient measure of their orthogonality is the Frobenius inner product X 1 , Since the trajectory matrix consists of w i = |A i | = T (E i ) 2 F entries of ith element x i of the initial ordered object, we can introduce the weighted inner product in the space M: called w-correlation by statistical analogy. Note that in this definition means are not subtracted.
Let X i be the elementary reconstructed components produced by the elementary grouping I j = {j}. Then the matrix of ρ The norm · w is called the weighted norm and serves as a measure of contribution of components to the decomposition (5): the share of X k is defined as X k 2 w X 2 w .
Trajectory spaces and signal subspace Let us introduce several notions related to subspaces generated by the data. For the data X the column (row) subspace of its trajectory matrix X is called column (row) trajectory space. The term 'trajectory space' usually means 'column trajectory space'. The column trajectory space is a subspace of R L , while the row trajectory space is a subspace of R K . In general, for real-world data the trajectory spaces coincide with the corresponding Euclidean spaces, since they are produced by a signal corrupted by noise. However, if the signal has rank-deficient trajectory matrix, then the signal trajectory space can be called 'signal subspace'. Column and row signal subspaces can be considered. Note that the dimensions of row and column subspaces coincide.
Objects of finite rank The class of objects that suit the SSA method are the so called objects of finite rank. We say that the object (time series or image) has rank r if the rank of its trajectory matrix is equal to r < min(L, K), that is, the trajectory matrix is rank-deficient. If the rank r does not depend on the choice of L for any sufficiently large object and trajectory matrix sizes, then we say that the object has finite rank (rank does not tend to infinity as the size of the object tends to infinity), see Golyandina et al. (2001, Chapter 5) and Golyandina and Usevich (2010) for rigorous definitions.
Since the trajectory matrices considered in SSA methods are Hankel or consists of Hankel blocks, then by well-known properties, rank-deficient Hankel matrices are closely related to objects satisfying to some linear relations. These linear relations can be taken as a basis for forecasting algorithms. In the one-dimensional case, rank-deficient Hankel matrices are closely related to Linear Recurrent Relations (LRRs) x n = r i=1 a i x n−i and therefore to time series which can be expressed as a sum of products of exponentials, polynomials and sinusoids.
Each specific SSA extension produces a class of specific objects of finite rank. The knowledge of ranks of objects of finite rank can help to indicate the SVD components (whose number is equal to the rank value) for the component reconstruction. For example, to reconstruct the exponential trend in the one-dimensional case, we need to group only one SVD component (the exponential has rank 1), while to reconstruct a sine wave we generally need to group two SVD components (the rank equals 2).
Surely, most of real-life time series or images are not of finite rank. For example, a time series can be a sum of a signal of rank r and noise. Then, due to approximate separability, SSA can extract the signal and then apply the methods that are designed for series of finite rank.

Typical code
The general structure of the Rssa package is described in Golyandina and Korobeynikov (2013) and holds for the multivariate extensions. Therefore let us discuss shortly the base Rssa functions for analysis and forecasting of one-dimensional time series. Implementation of SSA analysis and forecasting is mostly contained in the functions ssa (decomposition), reconstruct (reconstruction), rforecast and vforecast (forecasting), plot (plotting), wcor (weighted correlations for grouping).

Reconstructed Series
Time 0  Roughly speaking (see details in Golyandina and Korobeynikov (2013)), ssa performs steps 1 and 2, while reconstruct perform steps 3 and 4 of the algorithm described in Section 2.1. The values kind = "1d-ssa" and svd.method="auto" are default and can be omitted. Note that the function plot for the reconstruction object implements different special kinds of its plotting. In Fragment 2, the last two parameters of plot are the parameters of the function xyplot from the standard package lattice. The grouping for reconstruction was made on the base of the following information obtained from the ssa object: 1. 1D figures of eigenvectors U i (Fig. 3), 2. 2D figures of eigenvectors (U i , U i+1 ) ( Fig. 4), 3. matrix of w-correlations ρ w between elementary reconstructed series (functions wcor and plot, Fig. 5).
The following fragment shows the corresponding code.  (at = c(10, 20, 30))) > plot(reconstruct(s.fort, add.residuals = FALSE, add.original = FALSE, + groups = list(G12 = 2:3, G4 = 4:5, G6 = 6:7, G2.4 = 8:9))) Let us explain how the figures obtained by means of Fragment 3 can help for the grouping. Fig. 3 shows that the first eigenvector is slowly-varying and therefore the eigentriple (abbreviated as ET) ET1 should be included to the trend group. Fig. 4 shows that the pairs 2-3, 4-5, 6-7, 8-9, 10-11 are produced by sine-waves, since the corresponding 2D-scatterplots of eigenvectors are similar to regular polygons. This way of identification is based on the following properties: a sine wave has rank 2 and produces two eigentriples, which are sine waves with the same frequency and have a phase shift exactly or approximately equal to π/2, due to orthogonality of eigenvectors.
Their periods are 12, 4, 6, 2.4, 3, what can be determined by calculation of the number of polygon vertexes. Automatic methods of frequency calculation including LS-ESPRIT and TLS-ESPRIT Roy and Kailath (1989) are implemented in Rssa in the function parestimate and are described in (Golyandina et al. 2001, Sections 2.4.2.4. and 3.8.2) and Golyandina and Korobeynikov (2013) for one-dimensional time series.
The matrix of absolute values of w-correlations in Fig. 5 is depicted in white-black scale (white color corresponds to zero values, while black color reflects values equal to 1). Fig. 5 confirms that the indicated pairs are separated between themselves and also from the trend component, since the correlations between the pairs are small, while correlations between the components from one pair are very large. The block of 12-84 components is "gray", therefore we can expect that these components are mixed and are produced by noise. Fig. 6 contains four reconstructed modulated sine waves and shows that several sine waves has increasing amplitudes, while others are decreasing; the same is seen in Fig. 3. In Fig. 2, we grouped the modulated sine waves and obtained the seasonal component with varying form.
Finally, two methods of forecasting are implemented for one-dimensional time series: recurrent (function rforecast) and vector forecasting (function vforecast).
Fragment 4 shows how to forecast the series components (trend and signal) and depicts the result (Fig. 7). A proper decomposition is the necessary condition of the forecast accuracy.

Comments on Rssa
The detailed description of the Rssa package structure and the principles of implementation is contained in Golyandina and Korobeynikov (2013). It is related to the use of Rssa for the SSA processing of one-dimensional time series, but most of the given information is also valid for other considered types of data. The typical code demonstrates the main logic of the SSA processing by Rssa and thereby the structure of functions from the package. Below we briefly comment on the most important issues and main differences with the one-dimensional case. Note that the details are contained in the corresponding sections as comments to the typical code or the package.

Formats of input and output data
The inputs of SSA can be quite different depending on the kind of SSA used. For example, the inputs can be vectors, ts objects, matrices, data frames, lists of vector-like objects of W−correlation matrix exactly of the same type as the input object was. This can be seen in Fig. 6, where plot.ts (corresponding to the default plot.method = 'native') was used to draw the result of the Reconstruction step.
All the forecasting routines try to use the attributes of the input object for the resulting object (in particular, they try to add to the result the time scale). Unfortunately, this cannot be done in a class-neutral way, as it is done in the reconstruction case, and needs to be handled separately for each possible type of the time series classes. The forecasting routines know how to impute the time indices for some standard time series classes like ts, mts.

Plotting specifics
The package implements most of its own plotting capabilities with the help of the lattice package (Sarkar 2008); thus, the majority of the plotting specifics comes from lattice. In particular, the results of plotting functions of Rssa are proper trellis objects and can be modified, combined and otherwise altered using the standard lattice functionality.
A very convenient way of plotting the reconstructed series is via 'xyplot' method of plot for SSA reconstruction objects. There are powerful specializations of xyplot for ts and zoo objects and Rssa provides stub implementation of xyplot for bare matrices.

SVD method
Rssa allows one to use several SVD implementations: either full decompositions implemented via R functions eigen and svd, or truncated Lanczos SVDs from the package svd. These implementations differ in terms of speed, memory consumption and numerical stability, see (Golyandina and Korobeynikov 2013) for discussion concerning these properties. Here we note that the use of fast SVD methods is the key point in the SSA processing of images.
By default, the ssa routine tries to select the best SVD implementation given the series length, window length and the desired number of eigentriples. This corresponds to the selection of 'auto' for svd.method argument. However, one can override this default setting and select the desired SVD implementation, if necessary. This differs from the behavior of previous versions of Rssa, when the package tried to 'fix' the SVD method settings, when it thought it would be bad idea to proceed. The present and next package versions (as of Rssa 0.10 and later on) will always tolerate explicit user choice.

Efficient implementation
Complementary to the use of efficient SVD methods, the considerable speed-up is achieved by means of special methods of actions with Hankel-related matrices, see Section 6 for the algorithms description. This is important, since multiplication by a Hankel matrix and and hankelization of matrices, which can be very large, take a substantial part of the SSA algorithms. The efficient implementation of these routines relies on the possibility to compute the Fast Fourier Transform of a vector of arbitrary length with the optimal O(N log N ) complexity. In order to achieve this, Rssa uses FFTW library (Frigo and Johnson 2005). While we strongly encourage to always complement Rssa installation with FFTW, the package will fallback to R's FFT implementation if the package was compiled without FFTW. Pre-built Windows and Mac packages on CRAN are statically linked with FFTW; for other platforms it is usually possible to install the library using the standard package manager. (Note that a -dev version of the FFTW package is usually required.) If FFTW is not installed, only inherently one-dimensional analysis (1D-SSA, Toeplitz SSA and Complex SSA) will be available. The computational speed will be slower in this case too.
Following Korobeynikov (2010), let us briefly describe the algorithm complexity for the onedimensional case to show the order of speed-up. The direct implementation of the SSA algorithms has O(N 3 ) computational and space complexity in the worst case L ∼ K ∼ N/2 (this case is standard for SSA), where N is the series length. Therefore, it is important to provide efficient implementations which makes non-trivial cases feasible.
• The methods of an efficient Hankel matrix-vector multiplication by the means of the Fast Fourier Transform (see Section 6.1) and the usage of fast Lanczos-based truncated SVD implementations drop the complexity from O(N 3 ) down to O(kN log N + k 2 N ), where k is the number of calculated eigentriples.
• Second, one can represent the computation of the elementary series, which is rank 1 hankelization, as a special form of convolution. This way, the efficient implementation of the hankelization (diagonal averaging) procedure is again possible via the Fast Fourier Transform and has similar improvement of complexity.
• Third, it is possible to implement the vector forecast much more efficiently than using the direct implementation. The details can be found in Section 6.3.
Note that the principle of automatic calculation of necessary objects is used in the implementation of the package. For example, if 10 eigentriples were calculated while decomposing, then the user could still perform reconstruction by the first 15 components, since the decomposition will be automatically continued to calculate 11-15 eigentriples. Also, the routines reuse the results of the previous calculations as much as possible in order to save time (hence the argument cache of many routines). For example, the elementary series once calculated are stored inside the SSA object, so next time the function reconstruct might not need to calculate the resulting series from scratch.
All these speed-up improvements make it possible to perform in reasonable time the tasks of analysis of long series, image processing, tracking procedures and batch processing.

Multivariate Singular Spectrum Analysis
Let us consider the problem of simultaneous decomposition, reconstruction and forecasting for a collection of time series from the viewpoint of SSA. The method is called Multichannel SSA or Multivariate SSA, shortly MSSA. The main idea of the algorithm is the same as in the Basic SSA, the difference consists in the way of construction of the trajectory matrix.
In a sense, MSSA is a straightforward extension of SSA. However, the algorithm of MSSA was published even earlier than the algorithm of SSA; see Weare and Nasstrom (1982), where the MSSA algorithm was named Extended Empirical Orthogonal Function (EEOF) analysis. Formally, the algorithm of MSSA in the framework of SSA was formulated in Broomhead and King (1986b).
Here we consider the algorithm of MSSA for the analysis and forecasting of multivariate time series following the approach described in Golyandina et al. (2001, Chapter 2) for onedimensional series and in Golyandina and Stepanov (2005) for multidimensional ones. In this section we also describe the complex-valued version of SSA (called CSSA), since it can be considered as a multidimensional version for analysis and forecasting of a system of two time series.
The theory studying the ranks of the multivariate time series and the separability of their components for MSSA and CSSA is very similar to that of SSA and is briefly described in Section 7.1. Let us start with the algorithm description.

MSSA analysis
Consider a multivariate time series, that is, a collection (1) : . . . : X (s) ] the initial data for the MSSA algorithm. Since the general scheme of the algorithm described in Section 2 holds for MSSA, we need to define the embedding operator T MSSA (X) = X only.
Let L be an integer called window length, 1 < L < min(N p , p = 1, . . . , s). For each time series X (p) , the embedding procedure forms The trajectory matrix of the multidimensional series X is a matrix of size L × K and has the form where X (p) = T SSA (X (p) ) is the trajectory matrix of the one-dimensional series X (p) defined in (2). Thus, the trajectory matrix of a system of time series has stacked Hankel structure.
The eigenvectors {U i } in the SVD (3) of X form the common basis of the column trajectory spaces of all time series from the system. Factor vectors V i (named EEOF in climatology applications) consist of parts related to each time series separately, that is, where V (p) i ∈ R Kp and belong to the row trajectory spaces of the pth series.
The eigenvectors U i reflect the common features of time series, while the factor subvectors V show how these common features appear in each series. It is natural to transform a factor vector to a factor system of factor subvectors V (p) i . Then the form of transformed factor vectors will be similar to the initial system of series.
Analogously to one-dimensional case, the main result of application of MSSA is a decomposition of the multivariate time series into a sum of m multivariate series; the parameters are the window length L and the way of grouping. For the frequently used case m = 2, we denote by . . , s, the reconstructed series (usually, the signal) corresponding to the first group of eigentriples I 1 .

CSSA analysis
Let the system of series consist of two series, that is, s = 2. Then we can consider onedimensional complex-valued series X = X (1) + iX (2) and apply the complex version of SSA to this one-dimensional series. Since the general algorithm in Section 2 is written down in realvalued form, there is the difference in the form of the SVD performed in the complex-values space, where the transposition should be Hermitian.
Also, there is a generality of the uniqueness feature of the SVD expansion. If the singular values are different, then the SVD is unique up to multiplication of left and right singular vectors by c, where |c| = 1. In the real-valued case, c = ±1, while in the complex-valued case there are a lot of suitable constants c.

Remarks
1. Note that the indexing of time points 1, . . . , N p (p = 1, . . . , s) starting from 1 does not mean that the series start at the same time and can finish at different times if lengths of time series are different. The resultant decomposition obtained by the MSSA algorithm does not depend on the shift between series and therefore this numeration is just formal. Even more, decompositions of two series measured at the same time and in disjoint time intervals do not differ.
2. The original time ranges of series X (p) can be useful for depicting and interpreting them. Certainly, the reconstructed series have the same time ranges as the original ones. Factor subvectors from the factor system can also be synchronized for plotting based on the ranges of the initial series; although factor vectors are shorter than the initial series, their time shifts are the same.
3. For the SSA analysis of one time series, it makes sense to consider window lengths 2 ≤ L ≤ [(N + 1)/2], since the SVD expansions for window lengths L and N − L + 1 coincide. For the MSSA-analysis of more than one time series the expansions for all possible window lengths 2 ≤ L ≤ min i N i − 1 are generally different.
4. For simultaneous analysis of several time series, it is recommended to transfer them into the same scale. Otherwise, the structure of one time series will overweigh the results. To balance the time series, they can be either standardized (centered and normalized; in additive models) or only normalized (in multiplicative models). On the other hand, the scale of series can be used instead of their weights in the common decomposition if e.g. one of the series is more important or has smaller noise level.
5. In a sense, the most detailed decomposition can be obtained if the trajectory matrix X has maximal rank. In the general case of arbitrary time series, this corresponds to the case of square matrix. Thus, for a system of s time series of length N the window length providing the square trajectory matrix X is approximately sN/(s + 1) for MSSA. For the case of two time series this corresponds to 2L/3 for MSSA, while CSSA gives N/2.
6. The MSSA algorithm might be modified by the same ways as the SSA one; for example, Toeplitz MSSA, MSSA with centering can be considered. However, these options are not implemented in the described version of Rssa.

Covariance structure
Consider in more detail the case of two time series X = (F, G) and let F and G be the trajectory matrices of F and G correspondingly. Then MSSA considers the eigendecomposition of S = XX = FF + GG , that is, MSSA analyzes the averaging structure of two time series.
Since the SVD of a matrix coincides with the SVD of its transpose, we can consider the decomposition of the time series trajectory matrices on the basis of right singular vectors or, the same, to transpose X and consider eigenvectors of called EEOFs. The last formula more clearly demonstrates that MSSA takes into consideration cross-covariances of time series (more exactly, cross-covariances if centering is used).
Since the eigendecomposition of a complex-valued matrix A + iB can be reduced to the eigendecomposition of the real-valued matrix in the case of CSSA we in fact analyze eigenvectors of the matrix that is, structures of the time series are mixed in greater degree in comparison with MSSA.

Matching of series
Simultaneous analysis of several time series is usually performed to identify their relation and extract the common structure. Recall that the structure in SSA means that the trajectory matrix is rank-deficient. Certainly, for real-world series, the trajectory matrix is of full rank, since at least noise has no structure. Therefore, in what follows we say about rank of signal or its components.
Consider the system of series with rank-deficient trajectory matrix. The structure of the series is reflected by its trajectory space. Therefore, we can say that two time series have the same structure if their trajectory spaces coincide. For example, for two sine waves with equal periods their trajectory spaces coincide irrespective of the amplitudes and phases. This fact is evident, since the trajectory space is the span of subseries of length L of the initial series. To the contrary, sine waves with different frequencies have totally different structure and their combined trajectory space is the direct sum of series trajectory spaces.
If two time series are fully matched, then the trajectory space of one time series can be used for reconstruction or forecasting of the second series. If the series are totally different, then the first series is useless for the analysis of the second one.
For MSSA, the shift between time series, for example, the difference between phases of two matched sine waves, is of no consequence. Therefore, we cannot say anything about direction of causality (if any; see Hassani et al. (2013), where the attempt to detect causality is performed). Moreover, asymmetry of influence of one time series to the another series can be caused by different levels of noise.

Relation between SSA, MSSA and CSSA ranks
For MSSA and CSSA, the notions of time series of finite rank and of time series satisfying linear recurrence relations are analogous to these notions in SSA. However ranks of the same time series can differ depending on the applied method. Therefore, we can think about SSA-, MSSA-and CSSA-ranks.
If the time series has the same structure (and therefore the same SSA-ranks), then the MSSArank is equal to the SSA-rank of each time series. As for CSSA-rank, it can be even less than SSA-ranks for some special cases.

Consider a collection H
. . , s, of s signals of length N . Let r k denote the SSA rank of H (k) (i.e., dimension of the trajectory spaces generated by one-dimensional SSA applied to this time series) and r denote the MSSA rank of (H (1) , . . . , H (s) ). The relation between r and r k , k = 1, . . . , s, is considered in Section 7.1. In particular, it is shown that r min ≤ r ≤ r max , where r min = max{r k , k = 1, . . . , s} and r max = s k=1 r k . The case r = r max is the least favorable for MSSA and means that different time series do not have matched components. The case r < r max indicates the presence of matched components and can lead to advantages of simultaneous processing of the time series system.
In terms of matching, if all the series have the same characteristic roots (see Section 7.1 for definition), then this means that the time series X (m) , m = 1, . . . , s, consist of additive components of the same SSA-structure. Such time series are fully matched. For fully matched time series the MSSA-rank is much smaller than the sum of the SSA-ranks of the separate time series from the system. On the contrary, if the sets of characteristic roots do not intersect, then the time series have no common structure. In this case, the MSSA-rank is exactly equal to the sum of the SSA-ranks of the separate time series from the system. For the real-world time series, we are usually between these extreme cases.

Separability
The notion of separability for multidimensional time series is analogous to that for onedimensional series briefly commented in Section 2.2 and thoroughly described in (Golyandina et al. 2001, Sections 1.5 and 6.1). Section 7.1 contains several results on separability of multidimensional series. Generally, conditions of separability of multidimensional time series are more restrictive than that for one-dimensional series. In particular, the sufficient condition for separability of two series systems is the separability of each series from one collection with each series from the second one. However, for matched signals, their (weak) separability from the noise can be considerably improved by their simultaneous MSSA analysis.
Since weak separability is not enough for extraction of time series components, we should pay attention to strong separability related to eigenvalues produced by time series components. It appears (see Example 2 in Section 7.1) that the time series (F (1) , F (2) ) can produce different eigenvalues in SSA, MSSA and CSSA. Therefore, the application of an appropriate multidimensional modification of SSA can improve the strong separability. Again, matching of time series diminishes the number of eigenvalues related to the signal and thereby weakens the chance of mixing of the signal with the residual.

Choice of window length
Recommendations on the choice of window length for one-dimensional SSA analysis can be found, e.g., in (Golyandina et al. 2001, Section 1.6) and Golyandina (2010). However, the problem of the choice of window length in MSSA is more complicated than that for SSA. As far as is known, there is no appropriate investigation of the choice of optimal window length for analysis and, to a greater extent, forecasting of multidimensional time series. Moreover, the choice of the best window length for MSSA forecasting differs for different types of forecasting methods. The attempt of numerical investigation is done in Golyandina and Stepanov (2005) and extended in Section 3.5.
Analogy with one-dimensional case can help to formulate some principles for the choice of L. The main principle is the same as for SSA and states that the choice of L should provide (approximate) separability of series. Also, different special techniques can be transferred from SSA to MSSA. For example, Sequential MSSA can be applied by analogy with Sequential SSA. If trends are of complex or of different structure, a smaller window length can be applied to achieve the similarity of eigenvectors and to improve separability. Then the residual can be decomposed with a larger window length. While for SSA analysis it makes no sense to take L > (N + 1)/2, for MSSA analysis large L with small K i = N i − L + 1 can be taken instead of small L for trend extraction and smoothing.
Since L in SSA does not exceed the half of the time series length, the divisibility of L = min(L, K) on possible periods of oscillations is recommended in SSA. In MSSA, min(L, K i ) is not necessary equals L and therefore one puts attention on values of K i .
Numerical investigations show that the formula L = sN/(s + 1) is appropriate for the decomposition of several (a few) time series (see simulation results in Section 3.5), but evidently cannot be applied for the system of many short series (K i become too small for separability). Generally, the choice L = N/2 is still appropriate even for multivariate SSA.

Multivariate SSA forecasting
Forecasting in SSA is performed for a series component which can be separated by SSA and is described, maybe, approximately, by an LRR. For brevity, we'll say about forecasting of signal. If the forecasted series component (signal) is governed by an LRR, then it can be forecasted by this or extended LRR. The forecasting method is called recurrent if it consists in the continuation of the estimated signal by the estimated LRR.
Since the series is transformed to the sequence of L-dimensional reconstructed vectors laying in the signal subspace, then the forecast can be constructed by continuation of vectors in the given subspace. The forecasting SSA method is called vector if it is based on the continuation of the reconstructed vectors in the given subspace. The estimated signal subspace is also used in the recurrent forecasting for construction of the LRR.
CSSA forecasting algorithm is the straightforward extension of SSA forecasting to the complexvalued case, therefore we will not describe the algorithm.
Methods of MSSA forecasting of series systems are similar to that of SSA forecasting of onedimensional series. Since rows and columns of the trajectory matrix of a system of time series has different structure in contrast with SSA, there are two kinds of MSSA forecasting, row and column, depending on what space is used, row or column. Note that the column forecasting is exactly the SSA forecasting of each series in the system separately, in a given common subspace. Methods of one-dimensional SSA forecasting in a given subspace are fully described in (Golyandina et al. 2001, Chapter 2).

Notation
Let us introduce necessary notation.
Denote by A ∈ R Q−1 the vectors of the last Q − 1 coordinates of A ∈ R Q , that is, with the removed first coordinate, what is indicated by the line on the top of the vector denotation. Also, denote by A ∈ R Q−1 the vectors of the first Q − 1 coordinates of A, by π(A) the last coordinate of the vector. For matrix A = [A 1 : . . . : A r ] denote A = [A 1 : . . . : A r ] and A = [A 1 : . . . : A r ] and let π(A) = (π(A 1 ), . . . , π(A r )) be the last row of the matrix A.
Consider the following form of B ∈ R K , where K = s i=1 K i , induced by the structure of the row trajectory space: . . .
where B (j) ∈ R K j , and let µ(B) = (π(B (1) ), . . . , π(B (s) )) . Also, for r ]. For simplicity we assume that the set I corresponding to the forecasted component is given by the set of the leading components; this is made just for simplification of formulas. Thus, let r leading eigentriples ( λ j , U j , V j ) be identified and chosen as related to the signal of rank r so that The reconstructed series X, its trajectory matrix X and the reconstructed matrix X are defined in Section 2.1. Define consists of column vectors that are projections of column vectors of the trajectory matrix on the chosen subspace L col .
To avoid repeating the transpose sign, denote the vector of forecasted signal values for each time series. Recurrent forecasting is closely related to missing data imputation for components of vectors from the given subspace and in fact uses the formula (1) from Golyandina and Osipov (2007). Following Golyandina and Stepanov (2005), we can write out forecasting formulas for two versions of the recurrent MSSA forecast: row (generated by {U j } r j=1 ) and column (generated by {V j } r j=1 ). These one-term ahead forecasting formulas can be applied for M -term ahead forecast by recurrence.
The column recurrent forecasting performs forecast by an LRR of order L − 1, applied to the last L − 1 points of the reconstructed signal, that is, one LRR and different initial data. The row recurrent forecasting constructs s different linear relations, each is applied to the set of K i − 1 last points of series, that is, LRRs are different, but initial data for them are the same.
Column forecast Denote by Z the matrix consisting of the last L − 1 values of the reconstructed signals: If ν 2 < 1, then the column MSSA forecast is uniquely defined and can be calculated by the formula Note that the formula (10) implies that the forecasting of all individual signals is made using the same linear recurrent formula which is generated by the whole system.
Row forecast Introduce the vectors of the last K m − 1 values of the reconstructed signals , m = 1, . . . , s, and denote, If the inverse matrix (I s − SS ) −1 exists and r ≤ K − s, then the column MSSA recurrent forecast exists and can be calculated by the formula where R K = (I s − SS ) −1 SV . Note that formula (11) implies that the forecasting of the individual signals is made using the linear reations which are different for different series.
The forecasting value generally depends on the last values of all the time series of the series system.

Vector MSSA forecasting
Denote L col = span(U 1 , . . . , U r ) and L row = span(V 1 , . . . , V r ). Let Π col be the orthogonal projector of R L−1 on L col and Π row be the orthogonal projector of R K−s on L row .
Columns of Y are projections of row vectors of the trajectory matrix X on L row , while columns of X are the projections of column vectors of the trajectory matrix X on L col .
The explicit form of the matrices of the column and row projectors can be found in (Golyandina and Osipov 2007, formula (4)). However, the calculation by this formula is timeconsuming. The fast algorithm of calculation is presented in Section 6.3.
Column forecast We mentioned that for a given subspace (L col in our case) the column forecast is performed independently for each time series. Define the linear operator P col Vec : R L → L col by the formula Let us formulate the vector forecasting algorithm for j-th series. 1. In the notation above, define the vectors Z i as follows: 2. By constructing the matrix Z = [Z 1 : . . . : Z K j +M +L−1 ] and making its diagonal averaging we obtain the series z 1 , . . . , z N j +M +L−1 .
3. The numbers z N j +1 , . . . , z N j +M form the M terms of the vector forecast.
Row forecast Define the linear operator P row Vec : R K → L row by the formula such that A = Π row Z and µ(A) = R K Z.
Let us formulate the vector forecasting algorithm.
1. In the notation above, define the vectors Z i as follows: where K * = max(K i , i = 1, . . . , s). The reason for this is to make the M -step forecast inheriting the (M − 1)-step forecast with no redrawing. This specific of the vector forecasting provides its stability and accuracy if the accurately extracted component of finite rank is forecasted, that is, if long-term forecast is appropriate. Otherwise, long-term vector forecasting can be wrong and even the result of short-term vector forecasting can be also wrong for large K * or L correspondingly.

Typical code
Here we demonstrate how the MSSA decomposition of a system of time series can be performed by means of the Rssa package. Since the analysis and forecasting for one-dimensional time series by Rssa are thoroughly described in Golyandina and Korobeynikov (2013), we put more attention on the difference.
In Section 2.3 we decomposed the one-dimensional series FORT (sales of fortified wines). Here we add one more series, sales of dry wines (shortly DRY), for simultaneous analysis.
For loading the data we use the code from Fragment 1.

Reconstructed Series
Time 0  The code for component identification in MSSA is very similar to that in SSA, compare Fragment 6 and Fragment 3. The difference consists in the structure of the factor vectors; however, they are not necessary for identification. Fig. 9 (cf. Fig. 3) shows that the trend is described by ET1 and ET6, which is slightly mixed with seasonality. Fig. 10 (cf. Fig. 4) demonstrates the pairs of ETs that are related to seasonality.
Since the implemented methods of parameter estimation are based on eigenvectors only, they can be applied to eigenvectors in multidimensional case in exactly the same way as in the one-dimensional case.

Comments
Formats of input and output data While the representation of a one dimensional time series in R is pretty obvious, there are multiple possible ways of defining the multivariate time series. Let us outline some common choices.
• Matrix with separate series in the columns. Optionally, additional time structure like in mts objects, can be embedded. • Matrix-like (e.g. a data.frame) object with series in the columns. In particular, data.frame would be a result of reading the series from the file via read.table function.
• List of separate time series objects (e.g. a list of ts or zoo objects).
Also, the time scales of the individual time series can be normalized via head or tail padding with NA (for example, as a result of the ts.union call), or specified via time series attributes. Or, everything can be mixed all together.
The package is designed to allow any of the input cases outlined above and produces the reconstructed series in the same format. All the attributes, names of the series, NA padding, etc. is carefully preserved. For forecasted series, the time scale attributes for several known time series objects (e.g. ts) are inferred automatically where possible.
The examples in the Fragments 5, 14 provide an overview of the possible input series formats.
Plotting specifics Keep in mind that the default ('native') plotting method for reconstruction objects may or may not be suitable for multivariate time series plotting. For example, it provides many useful plotting possibilities for ts and mts objects, but might totally be unusable in case of data.frame objects, because it will just call the pairs function on the resulting data frame in the end.
Summary for ssa object The summary for SSA object (see Fragment 8) in MSSA case coincides with the one for 1D-SSA with the slight specifics of MSSA. Here one can see the individual series lengths (excluding the NA padding), the window length, the selected default SVD decomposition method and number of computed eigentriples. No factor vectors are computed, they will be re-computed on-fly when necessary.
Efficient implementation All ideas from the one-dimensional case can be extended to the multivariate case. In one-dimensional case, the complexity is determined by the series length N and the window length L and the worst case corresponds to L ∼ K ∼ N/2 with overall complexity of O(L 3 + L 2 K) = O(N 3 ).
In the multidimensional case (for the sake of simplicity assume that all the series have equal lengths N ), the worst case corresponds to L ∼ K ∼ sN/(s+1), that is, the order of complexity is the same O(N 3 ) but the constant can be considerably larger. Therefore, the achieved speed up can be much stronger than that in the one-dimensional case.
Note, that the Multichannel SSA can be viewed as a special case of Shaped 2D-SSA (see Section 5.3) and the current implementation in the package uses this under the hood.

Factor vectors
Factor vectors in MSSA have length K and consists of stacked vectors of length K i , i = 1, . . . , s, related to each series. Therefore, it is natural to depict them as a system of s vectors. Factor vectors are not necessarily contained in the ssa object. In particular, the s.wineFortDry object created in Fragment 5 has 0 factor vectors, what can be checked by the summary(s.wineFortDry) function. Note that if to use svd.method="svd" (and propack), then factor vectors will be calculated automatically. To get the factor vector corresponding to eigenvectors contained in ssa, the user can call the calc.v function.
Fragment 9 shows the difference between eigenvectors and factor vectors for small window length. The result for L = 24 is depicted in Fig. 13. The eigenvector in Fig. 13 captures the common behavior (which is almost constant) in the timescale of two years, while the factor vector is divided into parts reflecting individual features of the series, compare with trends in Fig. 8 which are repeated in Fig. 13. Note that signs of the calculated eigenvectors are random. For example, the first eigenvector is negative here. Thereby, the factor vectors are similar to the reconstructed series with opposite sign.

Preprocessing (normalization)
Let the time series in the system be measured in different scales. In statistics, this problem is typically decided by standardization of data. In Singular Spectrum Analysis, centering may not be an appropriate preprocessing. Therefore, two types of preprocessing can be applied, conventional standardization and normalization, that is, division on the square root of mean sum of squares. The normalization can be even more appropriate for positive series, since it changes only the scale of data.

Forecasting of series with different lengths as filling in
Typical code in Section 3.4 was demonstrated for series of equal lengths. However, the same code can be applied to series of different lengths. The full AustralianWine data are incomplete, there are no data for two months (point 175 and 176) for sales of Róse wines and there are no data for the last 11 months of Total sales.
Let us perform the following actions: (A) fill in the missing data in ROSE, (B) calculate the sum of sales of the present wines and (C) process this sum together with the total series and fill in the missing data in Total series by the simultaneous forecasting.
To use the analysis performed before, let us forecast the ROSE series together with the FORT series for (A). Fragment 11 implements (A). Certainly, this can be done by separate analysis of the ROSE series and the SSA methods of filling in missing data, but Fig. 15 shows that the result of the used method is quite accurate.

Reconstructed Series
Rose, norm Fragment 12 implements (B) and (C). Also, we compare the forecast of total sales separately (TOTAL) and together with the available sum of sales of main wines (MAINSALES). We consider two simultaneous forecasts together with changing the weight of the series MAINSALES from 1 to 100. Fig. 16 clearly demonstrates that for small contribution of MAINSALES the simultaneous forecast is close to separate forecast of Total, while for large weight of MAIN-SALES the simultaneous forecast of TOTAL tend to have the form similar to the form of MAINSALES. An additional investigation is needed to choose the better version. Oct , 1994Nov, 1994Dec, 1994Jan, 1995 Separate forecast of 'Total' Forecast of 'Total' using 'Main' Forecast of 'Total' using 'Main' with weight 100 Known 'Main' sales

Simultaneous decomposition of many series
In this example, we consider the system of many time series and show that the decomposition by MSSA helps to look at similar patterns of the series. We will use the methodology described in the previous sections.
Let us consider the collection of s = 6 series from the 'AustralianWine' dataset, which includes the series of wine sales considered in the typical code from Section 3.4. A considerable part of this multivariate series can be described as seasonality. Therefore, MSSA can have an advantage over conventional SSA.
Since the time series have different scales, namely, the 'Róse wines' sales have the order of 100 thousand of litres in month, while the sales of 'Fortified wines' are about 30 times larger, the time series should be transformed into the same scale. We choose the window length L = 163, then K i = 12 and K = 84. This choice does not correspond to the maximal rank of the trajectory matrix; however, this helps to avoid the mixture of components.
The identification of the trend (ET1,2,5) and the seasonality (ET3,4, 6-12) is performed on the base of eigenvectors and uses the principles described in the typical code from Section 3.4. Fragment 13 contains the code to get the reconstruction shown in Fig. 17 and 18.

Numerical comparison
In this section, we demonstrate how the accuracy of MSSA is related to the structure of the multivariate time series. The aim is to compare accuracy for separate analysis and forecasting of time series with simultaneous processing of the series system. We repeat the results from Golyandina and Stepanov (2005) and supplement them with new comparisons. In particular, the comparison results explain the choice of default forecasting method.
In the study below, we consider the case s = 2 and hence include CSSA into the range of SSA methods we compare.
The investigated model examples include the least favorable and the most favorable cases for MSSA as well as some cases well suited for the application of CSSA.
We take the following parameters for the simulation of the time series: N = 71, the variance of all noise components is σ 2 = 25, the number of replications is 10000. We consider the following three versions of the signal (H (1) , H (2) ).
The choice of these examples is determined by the fact that the dimensions of the signal trajectory spaces (i.e., ranks) are different for different extensions of SSA methods, see Table 2.
For each example the rank in bold font corresponds to the method with the best accuracy for this example. The results of investigation for different window lengths L are summarized in Tables 3 and 4. The smallest value in each row is marked by typing it in bold. The 24 term-ahead forecast was performed.  Tables 3 and 4 with Table 2 clearly demonstrates the relation between the accuracy of the signal reconstruction/forecast and the dimension of the signal trajectory space. For each example, the cells corresponding to the method with the best reconstruction accuracy are shown in bold and the overall minimum is in blue color.
Note that the reconstruction by SSA and CSSA is the same for window lengths L and N −L+1 (12 and 60, 24 and 48 for the considered examples). Reconstructions by MSSA are different for different L. Also note that the SSA-trajectory matrix has rank equal to min(L, N − L + 1) and the rank is maximal for L ≈ N/2. The MSSA-trajectory matrix has rank equal to min(L, (N − L + 1)s), where s is the number of time series in the system. This rank is maximal for L ≈ sN/(s + 1). Although the maximality of the rank does not guarantee the minimality of errors, this consideration means that the window length L for better separability might be larger than N/2. The simulations confirm this: the minimum of the reconstruction error for MSSA is achieved at L = 48 = 72 × 2/3.
The forecasting errors have much more complicated structure (see Golyandina (2010)). In particular, these errors for forecasting depend on the reconstruction errors for the last time series points; therefore, the error may have a dependence on L which is different from that for the average reconstruction errors. The considered examples show that the vector forecast is more accurate than the recurrent one and that the row MSSA forecast is slightly more accurate than the column MSSA forecast.
The considered examples confirm the following assertions: • The accuracy of the SSA-based methods is closely related to the structure of the signal trajectory spaces generated by these methods. MSSA has an advantage if time series from the system include matched components.
• Optimal window lengths for analysis and forecasting can differ. The accuracy of forecast is related to the accuracy of reconstruction; however, this relation is not straightforward.
• The vector forecast with the best window length is more accurate than the recurrent forecast. However, it is not always fulfilled if to compare accuracies for the same window length. Note that this is probably valid for forecasting of well-separated signal of finite rank only, see Remark 1 for explanation.
• The recommendations for the choice of window length (larger or smaller than the half of the time series length) for recurrent forecasting are in a sense opposite to that for the vector forecasting.
• For row and column forecasting (SSA and CSSA forecasting methods are particular cases of column forecasting) the recommendations are also opposite. It is not surprised, since L and K are swapped.
The next fragment demonstrates how the numbers from the tables can be obtain by means of Rssa for MSSA and CSSA analysis and vector forecastsing applied to Example A.
Fragment 14 (Simulation for reconstruction and forecasting accuracy estimation).

2D Singular Spectrum Analysis
In this section, we consider the extension of the SSA algorithm for decomposition of twodimensional data. This extension has the name 2D Singular Spectrum Analysis (or 2D-SSA for short). For 2D-SSA, the data object X is a two-dimensional data array of size N x × N y (or simply an N x × N y real-valued matrix), represented as X = X Nx,Ny = (x ij ) Nx,Ny i,j=1 . A typical example of a 2D-array is a digital 2D monochrome image.
2D-SSA was proposed as an extension of SSA in Danilov and Zhigljavsky (1997), and was further developed Golyandina and Usevich (2010); Rodríguez-Aragón and Zhigljavsky (2010). However, its first usage can traced back to the work of Ade (1983) on texture analysis (this work was also continued recently, see Monadjemi (2004)). Related decompositions can be found in methods for processing of seismological data (Trickett 2008). Finally, as with SSA-like methods for time series, the 2D-SSA decomposition is the base of subspace-based parameter estimation methods for sums of two-dimensional complex exponentials (see e.g. Rouquette and Najim (2001)).
A major drawback of the methods based on 2D-SSA decomposition was its computational complexity. The Rssa package contains an efficient implementation of the 2D-SSA decomposition and reconstruction.

2D-SSA algorithm
For a matrix A ∈ R M ×N (or C M ×N ), we denote by vec(A) ∈ R M N (or C M N ) its column-major vectorization. For a vector A ∈ R M N (or C M N ), we define its M -devectorization as the matrix that satisfies vec(A) = A. We mostly use the notation from the paper Golyandina and Usevich (2010).

The embedding operator
Since the general scheme of the 2D-SSA algorithm is described in Section 2, we need to describe only the embedding operator T 2D-SSA (X) = X.
The parameters of the method are the two-dimensional window sizes (L x , L y ), which are bounded as 1 ≤ L x ≤ N x , 1 ≤ L y ≤ N y and 1 < L x L y < N x N y . For convenience, we also denote K x = N x − L x + 1, K y = N y − L y + 1. As in the general scheme of the algorithms, we define L = L x L y (the number of rows of the trajectory matrix) and K = K x K y (the number of columns of the trajectory matrix).
Consider all possible L x × L y submatrices of X (2D sliding windows). For k = 1, . . . , K x and l = 1, . . . , K y , we define by X a L x × L y submatrix, which is shown in Fig. 19. Note that the x axis is oriented to the bottom, and the y axis is oriented to the right; the origin is the upper left corner. We use this orientation because it is consistent with the standard mathematical indexing of matrices Golyandina and Usevich (2010).
Then the trajectory matrix is defined as where the columns X k+(l−1)Kx = vec(X

Hankel-block-Hankel structure
It can be shown (see Golyandina and Usevich (2010)) that the trajectory matrix (16) has the Figure 19: Moving 2D windows following structure: where each block H j is an L x × K x Hankel matrix constructed from the X :,j (the j-th column of the 2D-array X). More precisely, H j = T SSA (X :,j ), where T SSA is defined in (2). Therefore, in the literature the matrix (16) is called Hankel-block-Hankel (or HbH for short) because it is block-Hankel with Hankel blocks .

Trajectory space
From (16) we have that the trajectory space is the linear space spanned by the L x × L y submatrices of X. Therefore, the eigenvectors U i also can be viewed as vectorized L x × L y arrays. These devectorizations are denoted by Ψ i def = vec −1 Lx (U i ). Similarly, the rows of X are vectorizations of the (K x , K y ) submatrices where X j is the j-th row of the matrix X. Therefore, the factor vectors V i also can be viewed

Comments
1. The algorithm of 2D-SSA coincides with the algorithm of MSSA if to choose the window sizes (L x , 1) or (1, L y ). This idea will be extended later on in Section 5.
2. The arrays of finite rank in 2D-SSA (i.e., the arrays such that T 2D-SSA has finite rank) are sums of products of polynomials, exponentials and cosines, similarly to the onedimensional case. More details can be found in Section 7.2.

Typical code
Here we demonstrate a typical decomposition of 2D images with the package Rssa. We follow the example in Section 2.3, but stress on the differences that appear in 2D case.
As an example, we use the image of Mars by Pierre Thierry, from the tutorial 1 of the free IRIS software (see the data description in the package). The image is of size 258 × 275, 8-bit grayscale, values from 0 to 255. The input code for this image can be found in Fragment 15 (the image is included in the package Rssa).

> library("Rssa") > data(Mars)
We would like to decompose this image with a 25 × 25 window (the example in the paper Golyandina and Usevich (2010)). Easy calculations show that even these small window sizes would give rise to a 625×58734 trajectory matrix. In Fragment 16 with svd.method = "svd", we intentionally comment the call to the SSA function, because we do not recommend to use it, unless the trajectory matrix is very small.

Comments
Formats of input and output data The input for 2D SSA is assumed to be a matrix (or an object which can be coerced to a matrix).
Plotting specifics All the plotting routines by default use the raster representation (via useRaster = TRUE argument provided to the lattice plotting functions). It most cases it does not make sense to turn the raster mode off, since the input is a raster image in any case. However, not all the graphical devices support this mode.
Efficient implementation Most of the ideas from the one-dimensional case can be either transferred directly or generalized to the 2D case. The overall computational complexity of the direct implementation of 2D-SSA is O(L 3 + KL 2 ) and thus 2D-SSA can be prohibitively time consuming even for moderate image and window sizes. (Recall that L = L x L y and K = K x K y .) The ideas presented in Section 6.2 coupled with Lanczos-based truncated SVD implementations (Larsen 1998;Yamazaki et al. 2008;Korobeynikov 2010) allows to dramatically reduce the computational complexity down to O(kN log N + k 2 N ), where N = N x N y and k denotes the number of desired eigentriples. Therefore, the achieved speed-up can be much higher than that in the one-dimensional (SSA) and multivariate (MSSA) cases. Also, for svd.method="eigen", the matrix XX is computed using the fast matrix-vector multiplication from Section 6.2. In this case, the complexity is reduced from O(KL 2 ) to O(LN log N ).

Examples
Adaptive smoothing 2D-SSA can be also used for adaptive smoothing of two-dimensional data. It was used for this purpose in   We take cumulative sums of reconstructed components Y 1 = X 1 , Y 1 = X 1 + X 2 , and Y 1 = X 1 + X 2 + X 3 (this is a convenient way to compute reconstructed components for sets J 1 = I 1 , J 2 = I 1 ∪ I 2 , and J 3 = I 1 ∪ I 2 ∪ I 3 ). We plot the reconstructed components in Fig. 25. The cumulative components Y k are shown in Fig. 26 using the parameter cumsum of the plotting function of Rssa.
Some of the methods of 2D-ESPRIT type are implemented in Rssa. The methods are based on computing a pair of shift matrices for x and y directions and their joint diagonalization Rouquette and Najim (2001). Currently, two methods of joint diagonalization are implemented in Rssa 2D-ESPRIT (from Rouquette and Najim (2001)) and 2D-MEMP with improved pairing step (see Rouquette and Najim (2001); Wang et al. (2005)).
We continue the example of Mars from Fragment 21. This example demonstrates advantages of Rssa because of a possibility to choose large window sizes, which was always considered as a drawback of ESPRIT-type methods. Fragment 26 shows the corresponding code that outputs the estimated exponentials. In Fig. 28, the estimated pairs complex exponentials (λ k , µ k ), are shown on separate complex plane plots and for the same k the points for λ k and µ k have the same color. and correspondence between them is made using different marker type.  col = c(11,12,13,14)) > plot(s.Mars.160.80, type = "vectors", idx = r.Mars.160.80.groups$Noise, + cuts = 255, layout = c(4, 1), plot.contrib = FALSE) In Fragment 26 and Fig. 28, it can be seen that each pair (λ k , µ k ) has its conjugate counterpart (λ k , µ k ) = conj((λ k , µ k )) (where conj(·) denotes the complex conjugation). Indeed, it is the case for for k = 1, k = 2, and for k = 3, k = 4. Therefore, the periodic noise is a sum of two planar sines, as explained in Section 7.2. This is also confirmed by the plots of eigenarrays in Fig. 29. shape of the input array and window. In other words, not all the values of the rectangular image may be specified, and the sliding window needs not be rectangular (see Fig. 30).

Shaped 2D Singular Spectrum Analysis
Formally speaking, a shape B is a bounded subset of N 2 (a set of two-dimensional natural indices). A B-shaped array is a partially indexed array X = X B = (x (i,j) ) (i,j)∈B .
For two-dimensional indices = ( x , y ) and κ = (κ x , κ y ) we define a shifted sum We also define a shifted Minkowski sum of two shapes A and B as

Construction of the trajectory matrix
The input data for the algorithm is an N-shaped array X = X N = (

The embedding operator
For κ ∈ N 2 we define a shifted L-shaped subarray as X L+ -{κ} = (x α ) α∈L+ -{κ} (see Fig. 30). The index κ is a position of the origin for the window. Consider the set of all possible origin positions for L-shaped windows: We assume that K = {κ 1 , . . . , κ K } ⊂ N 2 , where κ j are ordered in lexicographical order. Then the trajectory matrix is constructed as follows: T ShSSA (X) := X = [X 1 : . . . : where the columns are vectorizations of the shaped submatrices X L+ -{κ j } .

Quasi-Hankel matrix
The trajectory matrix is exactly the quasi-Hankel matrix from Mourrain and Pan (2000) constructed from the sets L, K ⊂ N 2 : Note that in Mourrain and Pan (2000) instead of the shifted sum + -the usual sum of indices is used, because their definition of natural numbers N includes 0.  Fig. 31). It also corresponds to the maximal set K such that L + -K ⊆ N.

Comments
We describe an algorithm for computation of the set K in Section 6.2.4.
It may happen that some of the elements of the original shaped array X N do not enter in the trajectory matrix (for example, the element (5, 8), shown in gray in Fig. 31). In this case, the ShSSA analysis applies only to the N -shaped array X N , where N is the set of all indices that enter in the trajectory matrix. (In fact, N = K + -L .)

Column and row spaces
From the construction of the trajectory matrix, the trajectory space is the linear space spanned by all L-shaped subarrays and the eigenvectors U i also can be viewed as L-shaped subarrays. These arrays, as in 2D-SSA, are called eigenarrays.
Analogously, each row X i is a vectorized (in lexicographical order) K-shaped array X i +κ−1 . This is a K-shaped subarray of X starting from the element i . Therefore, the factor vectors V i also can be viewed as K-shaped subarrays. These arrays are also called factor arrays.

Typical code
In this subsection we repeat the experiment from Section 4.2 (periodic noise removal from the image of Mars), but using the Shaped SSA. Therefore, the code for loading the image is the same as in Fragment 15.
The array shape can be specified in two different ways: (A) by passing the NA values in the input array (these elements are excluded from the shape), (B) by specifying the parameter mask -a logical N x × N y array (the indicator of N).
If both shape specifications are present, we consider their intersection. The shape of the window is also passed as an L x × L y logical array (wmask). The shapes can be also specified by a command circle, as shown in Fragment 27.  > r.Mars.shaped <-reconstruct(s.Mars.shaped, + groups = list (Noise = c(7,8,9,10))) > plot(r.Mars.shaped, cuts = 255, layout = c(3, 1), fill.color = "green") The reconstruction results are shown in Fig. 33. In Fig. 33 you can see that the elements are reconstructed only inside the resulting mask, however the original array is drawn for all available elements (except the NA values). The grouping for this decomposition was made based on the following information: (A) eigenarrays (see Fig. 34), (B) the matrix of matrix of w-correlations (see Fig. 35). Fragment 29 shows the corresponding code.

Reconstructions
Original [1,260] Noise [−9.5, 8.8] Residuals [−6.7, 250]  Efficient implementation In Rssa the Shaped 2D-SSA is implemented as an additional option for kind = "2d-ssa". However, the use of input arrays with non-trivial shapes would incur additional projection operation during the computations (see Section 6.2). Ordinary 2D SSA can be seen as a special case of Shaped 2D SSA with rectangular window and the mask which covers the whole image. The package optimizes for this common case and no projections are performed when it is known that they are trivial and thus we have ordinary 2D-SSA. So, the computational complexity of the method is the same regardless how the full mask was specified: providing NULL to wmask and mask arguments or making the masks trivial. This is a convenient behavior which simplifies e.g. batch processing of the images with shapes automatically deduced from the images.

Special Window Shapes
The package provides convenient interface for setting several special forms of the window. This is implemented via special expressions which can be passed to wmask argument: • wmask = circle(R) specifies circular mask of radius R. • wmask = triangle(side) specifies the mask in the form of isosceles right-angled triangle with cathetus side. Right angle lay on topleft corner of container square matrix.

Special cases of Shaped 2D-SSA
In this section we will show that all the considered variants of SSA are in fact special cases of ShSSA, for carefully chosen array and mask.
First of all, SSA can be easily embedded in ShSSA. Consider an N ×1 array and L×1 window (or, alternatively 1 × N array and 1 × L window). Also, in Golyandina and Usevich (2010) it was mentioned that MSSA with equal time series lengths is a special case of 2D-SSA. In the next subsection we extend this construction to time series with different lengths. The window is taken to be L = {1, . . . , L} × {1}. The construction of this array is shown on Fig. 36, also with the window L and the shape K. It is easy to verify that T ShSSA (X) = T MSSA (X).

MSSA: 2D packing
Consider also construction of the rows of X = T MSSA (X). The rows of X are vectorizations of the K-shaped subarrays (see Fig. 36). consists of the time series plus separators between them that are not included in the array shape. The window is taken to be L × 1 (or 1 × L), depending on the arrangement chosen.
In Fig. 37 we show the horizontal variant of packing.

Mosaic Hankel matrices
Mosaic Hankel matrix (Heinig 1995) is a block matrix with Hankel blocks. It can be considered as the most general one-dimensional (with one-dimensional displacement) generalization of Hankel matrices.
Let L 1 , . . . , L s and K 1 , . . . , K t be integer vectors, and X (i,j) ∈ R L i +K j −1 be time series. Then the mosaic Hankel matrix is constructed as follows.
Note that the sizes of the blocks may be different. The only requirement is that they should match as a "mosaic". The case of mosaic Hankel matrices corresponds to several collections of multidimensional time series Markovsky and Usevich (2014).
It is easy to construct mosaic Hankel matrices, based on 2D embedding of MSSA. A j-th block column is a transposed matrix T MSSA for the collection of time series (X (1,j) , . . . , X (s,j) ) and window length L = K j . Therefore, the mosaic Hankel matrix can be constructed by stacking shapes (with separators) from Fig. 36 (replacing K by L) because of transposition. The resulting construction is shown in Fig. 38. In this case, in the SSA decomposition will have the common basis of eigenvectors, as in 2D-SSA. This type of matrices is used for methods of comparison of images using 2D-SSA (Rodríguez-Aragón and Zhigljavsky 2010). This is also the same matrix that is used in recent methods of parallel MRI imaging (Uecker et al. 2013).
The matrix for the M-2D-SSA can be constructed in a similar way to the case of mosaic Hankel matrices. An array X of size N x × (sN y + s − 1) is constructed from the arrays with one-element separators. The resulting array is shown in Fig. 39. In general, a similar Figure 39: Shaped construction for M-2D-SSA construction can handle arrays of different sizes, shapes, and shaped windows. Also note that in the extended array the original arrays (both for Fig. 39 and Fig. 38) may be arranged arbitrarily (for example in a table-like planar arrangement). The only requirement is that they should be separated.

Fast computations with Hankel matrices
In this section, we describe the algorithms that are used for multiplication of Hankel matrices by vectors and rank-one Hankelization. Although algorithms for these operations were already proposed in Korobeynikov (2010), we use another type of circulants which simplifies the algorithms compared to Korobeynikov (2010). This section is also a foundation for Section 6.2 on computations with quasi-Hankel matrices.

Preliminaries
In what follows, A B denotes the elementwise product of two vectors A, B ∈ C N , rev(A) denotes the reversion of the vector A, and conj(A) denotes the elementwise conjugation. We also denote by E (N ) j the j-th unit vector in R N .
For a complex vector X ∈ C N we define its discrete Fourier transform as where F N ∈ C N ×N is the Fourier matrix with elements (F N ) k,l = e −2πi(k−1)(l−1)/N (see Korobeynikov (2010)). The inverse Fourier transform is given by where F * N is the Hermitian transpose of F N . The Toeplitz circulant constructed from the vector X = (x k ) N k=1 is, by definition The Hankel circulant constructed from the vector X is, by definition We will also need the following relation between Hankel and Toeplitz circulants.
Lemma 1. For any j = 1, . . . , N and A ∈ C N we have that is the j-th forward circular shift of the vector rev(A) . Therefore, it coincides with the j-th row of C T (A), which is equal to (E (N )

Matrix-vector multiplication
It is well known (see Korobeynikov (2010)) that multiplication of the Toeplitz circulant C T (X) by a vector A ∈ C N can be computed using discrete Fourier transform as A similar property holds for the C H (X). Since it is less common, we provide a short proof.
Lemma 2. For X ∈ C N and A ∈ R N we have that Proof. The circulant C 2 def = C T ((x N , x 1 , . . . , x N −1 )) can be obtained from C H (X) by reversion of all rows. Therefore, the product of C H (X) by A is equal to 2 For X ∈ R N and K, L : N = K + L − 1, the Hankel matrix T SSA (X) is an L × K submatrix of C H (X). Therefore, multiplication by T SSA (X) can be performed using the following algorithm.
In Korobeynikov (2010) two different algorithms were used for multiplication by T SSA (X) and its transpose. But Algorithm 1 also suits for multiplication by T SSA (X), because sizes of the matrix are used only in the first step (padding of V by zeros) and the last step (truncating the result). Also, the DFT F N X (step 3) can be precomputed. But, compared to Korobeynikov (2010), the precomputed object does not depend on L.

Rank-one hankelization
Next, we describe the algorithm for hankelization, which coincides with that from Korobeynikov (2010). But we describe the algorithm in a different way, that is a foundation for quasihankelization in Section 6.2.
The hankelization operation computes for a given X ∈ R L×K the vector X ∈ R N that minimizes the distance T SSA ( X) − X 2 F . It is well-known that the hankelization of X is the vector The denominators w j = T SSA (E (N ) j ) 2 F are exactly the weights in w-correlations. For rankone matrices X = U V ∈ R L×K , the numerator in (25) can be expressed more compactly: , and the last equality follows from Lemma 1.
Then, the hankelization can be computed using the following algorithm.

Preliminaries
For two matrices A, B ∈ C Nx×Ny we define by A B their elementwise product and by conj(A) the elementwise complex conjugation. For k = 1, . . . , N x and l = 1, . . . , N y , we define an elementary array E k,l = E ∈ R N is the j-th unit vector. For two matrices A and C their Kronecker product is denoted as A ⊗ C.
For a complex matrix X = (x k,l ) Nx,Ny k,l=1 its discrete Fourier transform is defined as where F N is the matrix of the discrete Fourier transform (22). By the properties of the vectorization operator, we have that vec(F Nx,Ny (X)) = (F Ny ⊗ F Nx ) vec(X).
The TbT circulant constructed from the matrix X is, by definition where C T,j def = C T (X :,j ), where X :,j denotes the j-th column of X. The HbH circulant constructed from the matrix X is, by definition where C H,j def = C H (X :,j ). Note that for a rank-one array X = X x X y we have that We will also need the following relation between HbH and TbT circulants.
Proof. Due to linearity of the equation, we need to prove it only for rank-one matrices X = X x X y . Then, by (26) we have that vec(X) C HbH (E k,l ) = (X y ⊗ X

Multiplication by a quasi-Hankel matrix
Analogously to the (23) and (24), the following equalities hold true.
Proof. Due to linearity of the Fourier transform and circulant matrices we can prove the statements only for matrices X = X x X y and A = A x A y . In this case, by (26) we have that vec F −1 Nx,Ny F Nx,Ny (X) F Nx,Ny (A) The proof of the second statement is analogous. 2 Now we consider multiplication by a quasi-Hankel matrix. Let L, K, N ∈ {1, . . . , N x } × {1, . . . , N y } be such that L + -K = N. Then the quasi-Hankel matrix T ShSSA (X) (defined in (21)) is a submatrix of the HbH circulant C HbH (X). Now let us write this formally and provide an algorithm for the calculation of the matrix-vector product.
Then we have that T ShSSA (X) = P L C HbH (X)P K .
Hence, the matrix-vector multiplication algorithm can be written as follows.

Rank-one quasi-Hankelization
Let L, K, N ∈ {1, . . . , N x } × {1, . . . , N y } be such that L + -K = N. The quasi-Hankelization operator, by definition, computes for a given X ∈ R L×K the shaped array X N = ( x k,l ) (k,l)∈N that minimizes the distance T ( X N ) − X 2 F . It is well known that quasi-Hankelization can be expressed as averaging. More precisely, x k,l = X, T (E k,l ) F T (E k,l ) 2 F , for (k, l) ∈ N.
The numerator represents summation over the set of positions in X that correspond to the (k, l)-th element of an array in a quasi-Hankel matrix. The denominator is equal to the number of such elements.
In this section, we assume that X is a rank-one matrix, i.e. X = U V . Then Then the K corresponds to the maximal K such that all the elements of the L × K submatrix of T 2DSSA (I (N) ) are equal to 1.

Fast vector forecasting algorithm
It is mentioned in (Golyandina and Zhigljavsky 2013, p. 76) that the vector forecasting is time-consuming, while recurrent forecasting is fast. However, this is so if to implement the algorithm from Section 3.3 directly. It appears that it is possible to considerably accelerate the vector forecasting. Moreover, in the current implementation in Rssa the vector forecasting is slightly faster than the recurrent one.
In this section we will use the notation from Section 3.3.

1d-SSA case
Consider the forecasting in the subspace L col given by a basis {P 1 , . . . , P r }. Denote P = [P 1 : . . . : P r ]. Each reconstructed vector X k ∈ L col and therefore there exist coefficients W k ∈ R r such that X k = PW k . Denote W = [W 1 : . . . : W K ]. In fact, the input for the algorithm is the minimal decomposition of X into the sum of elementary matrices of rank 1 in the form X = PQ and W = Q .
Note that if we have a Singular Value Decomposition of X = [ X 1 : . . . : X K ], then the left singular vectors provide the basis of the subspace, while W k are determined by right singular vectors and singular values: P = [U 1 : . . . : U r ] and Q = [ √ λ 1 V 1 : . . . : √ λ r V r ].
Remark 2. 1. Item 1 in the algorithm is exactly the calculation of the shift matrix from the LS-ESPRIT method for frequency estimation (Roy and Kailath 1989).
2. If {P i } is orthonormal system, then we can compute D more effectively, not using the QR decomposition: where π = π(P), I r is the r × r identity matrix.

Column MSSA-forecast
Column vector MSSA forecasting in the subspace L col is reduced to performing s 1D-vector forecasts in the same subspace L col for each of s series. Note that we can compute the shift matrix D only one time, since it depends only on P.

Row MSSA-forecast
Row vector forecast slightly differs from the column one, but the idea is the same except for we deal with forecasting in the row subspace L row = span(Q 1 , . . . , Q r ) and continue the sequence of the row vectors Y k of X. Let Y k = QW k . Algorithm: • Compute the matrix D = Q † Q using the QR decomposition.

Proof of the algorithm validity
Let us prove that these algorithms construct the vector forecasts described in Section 3.3. For simplicity, we will consider the one-dimensional case, that is, SSA vector forecast, which coincides with the column MSSA forecasting for one-dimensional series.
Since the roots are determined by the structure of the trajectory space, the following proposition can be proved.
Consider a simple example.

Arrays of finite rank
The theory of arrays of finite rank is mainly contained in Golyandina and Usevich (2009). We present here a short summary.
In this section we consider a class of infinite arrays X = (x m,n ) ∞,∞ m,n=0 given in a parametric form: x m,n = r k=1 c k λ m k µ n k , where (λ k , µ k ) ∈ C 2 are distinct pairs of complex numbers. It can be shown that for large enough N x , N y , L x , L y , the rank of the trajectory matrix X is equal to r (or equivalently, the arrays of the form (30) are arrays of finite rank. Note that the exponentials can be also represented in the form λ k = exp(2πiω x,k + φ x,k ), µ k = exp(2πiω y,k + φ y,k ).
We should note the class of arrays of finite rank also contains bivariate polynomials and their products with the functions from (30). An algebraic characterization of the arrays of finite rank can be found in Golyandina and Usevich (2009).
Then the representation (30) becomes a sum of d + s planar modulated sinewaves: x m,n = d+s k=1 b k ρ m x,k ρ n y,k cos (2π(ω x,k m + ω y,k n) + φ k ) , where b k and φ k ∈ [0; 2π) are unique real coefficients obtained from c k .
Example 4. The product of sines can be uniquely represented as a sum of two planar sines.

Acknowledgement
The paper was partly supported by the project NG13-083 of the Dynasty Foundation in the area of computer science. Konstantin Usevich has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC Grant agreement no. 258581 "Structured low-rank approximation: Theory, algorithms, and applications".