Randomised multichannel singular spectrum analysis of the 20th century climate data

In this article, we introduce a new algorithm called randomised multichannel singular spectrum analysis (RMSSA), which is a generalisation of the traditional multichannel singular spectrum analysis (MSSA) into problems of arbitrarily large dimension. RMSSA consists of (1) a dimension reduction of the original data via random projections, (2) the standard MSSA step and (3) a recovery of the MSSA eigenmodes from the reduced space back to the original space. The RMSSA algorithm is presented in detail and additionally we show how to integrate it with a significance test based on a red noise null-hypothesis by Monte-Carlo simulation. Finally, RMSSA is applied to decompose the 20th century globalmonthlymean near-surface temperature variability into its low-frequency components. The decomposition of a reanalysis data set and two climate model simulations reveals, for instance, that the 2 6 yr variability centred in the Pacific Ocean is captured by all the data sets with some differences in statistical significance and spatial patterns.


Introduction
Our motivation to focus on advanced spatio-temporal data analysis is to better understand the decadal climate variability in the Earth system and illuminate the capabilities of prediction tools to capture the associated signals (Meehl et al., 2014). Inter-and intra-decadal climate variability is inherent to the oceanÁatmosphere system and is further coupled to other Earth system components, such as sea-ice and land surface (Meehl et al., 2009). The variability appears as complex four-dimensional (or spatio-temporal) structures in Earth system variables, such as wind, temperature and precipitation (Solomon et al., 2011).
These structures are embedded in extremely largedimensional data sets gathered and generated in reanalysis of atmospheric and oceanic observations, and in massive simulation endeavours using Earth system models worldwide. Applicability of advanced data analysis tools is severely hampered by the very large dimensionality of the climate data.
Many common analysis methods, such as principal component analysis (PCA; Von Storch and Zwiers, 1999), involve eigen-problems, which become impossible to solve with increasing data dimension. Earlier we illustrated the use of random projections (RP) as a tool to tackle highdimensional problems (Seitola et al., 2014). We demonstrated how PCA can be applied in three-dimensions to problems that are beyond practical computational limits without efficient dimension reduction. PCA is not an ideal tool, however, to extract and illustrate four-dimensional eigen-features in climate data. In this respect, the multichannel singular spectrum analysis (MSSA; Broomhead and King, 1986a, b) is a much more appealing method since the MSSA eigen-problem inherently contains the autocovariance in the lagged copies of the original data vectors. The computational burden is, however, even larger than in PCA. We overcome this burden by a novel randomised version of MSSA, called RMSSA. To our knowledge, this approach has not been suggested before. We note that Oropeza and Sacchi (2011) incorporate a randomising operator into MSSA for noise attenuation in seismic data, but their algorithm is not aimed directly at large-dimensional problems. In RMSSA, RPs are used essentially to enable analysis of extremely large-dimensional data sets. *Corresponding author. email: teija.seitola@fmi.fi Tellus A 2015. # 2015 T. Seitola et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license.
In this article, we present the RMSSA algorithm in detail and also include a test for the statistical significance of the results (Monte-Carlo MSSA; Allen and Robertson, 1996) in the algorithm. We demonstrate the use of RMSSA by decomposing the 20th century global monthly mean near-surface temperature variability into its low-frequency components. The data sources are described in Section 2.3.

Multichannel singular spectrum analysis
MSSA was introduced into the study of dynamical systems by Broomhead and King (1986a, b). The method is equivalent to extended empirical orthogonal function (EEOF) analysis (Weare and Nasstrom, 1982), but there are differences in the choice of some important parameters and in the interpretation of the results (Plaut and Vautard, 1994).
In traditional PCA or EOF analysis (e.g. Rinne and Karhila, 1979), spatial correlations (in case of climatic data sets) are used in determining the patterns that explain most of the variability in the data set, but MSSA differs from this traditional method by also taking into account the temporal correlations. In other words, standard PCA decomposes a spatio-temporal field into spatial PC loading patterns (EOFs) and corresponding PC score time series (PCs), whereas MSSA also adds a temporal dimension to EOFs. MSSA PCs and EOFs are often called space-time PCs (ST-PCs) and spacetime EOFs (ST-EOFs), and we have adopted this notation here. A more detailed description of MSSA is presented in Ghil et al. (2002) and in Appendix A.1 here.
2.1.1. Choice of the lag window. The idea of MSSA, in brief, is to find the patterns that maximise the lagged covariance of the data set X N )L within M lags. In case of a gridded climate data set, N represents the time steps and L is the number of grid points. The columns of the data matrix X are often called channels. The length of the lag window M is a user choice. For example, Elsner and Tsonis (1996) suggest that the results of MSSA do not change significantly with varying M as long as N!!M and they recommend using M 0N/4. Vautard and Ghil (1989) recommend to choose M no larger than approximately N/3. Clearly, if the number of channels L is large in the beginning, choosing large M would result in a very highdimensional data matrix with M)L columns, including lagged copies of each channel in X.
Determining the length of the lag window M is a tradeoff between spectral resolution and statistical significance of the obtained components. The larger M is chosen, the more temporal information can be extracted but at the same time the variance is distributed on a larger set of components. If M is small, the statistical significance of the obtained components is enhanced. In this study, we used several values of M in order to test its effects on the results.
2.1.2. Assessing statistical significance with Monte-Carlo MSSA. ST-PCs/ST-EOFs often appear in pairs ('sinusoidal') that explain approximately the same variance and are p/2 out of phase with each other. These pairs are said to present stationary or propagating oscillatory modes of the data set (Plaut and Vautard, 1994). Modes with period less than or equal to M can be only presented by such pairs. However, existence of such a pair does not guarantee any physical oscillation, and according to Allen and Robertson (1996) such pairs can also be generated by non-oscillatory processes, such as first-order autoregressive noise.
This finding led Allen and Robertson (1996) to formulate a test for the statistical significance of MSSA components. The identified components are tested against a null-hypothesis of the data being generated by independent AR(1) processes (i.e. red noise) with the same variance and lag-1 autocorrelation as the original input time series. This procedure is called Monte-Carlo MSSA (MC-MSSA), and it is described in more detail in the original study of Allen and Robertson (1996) as well as in Appendix A.1 of this article.
2.1.3. Reconstructed components. ST-PCs cannot be compared to the original time series as such; instead, they can be represented in the original coordinate system by their reconstructed components, RCs (Plaut and Vautard, 1994;Ghil et al., 2002). In the reconstruction, the ST-PCs are projected back onto the eigenvectors (ST-EOFs) and each RC is a kind of filtered version of the original multivariate time series. Construction of RCs is illustrated in Fig. 1. Several ST-PCs/ST-EOFs can be used in the reconstructions, and if there is an oscillation that appears as a sinusoidal pair, both of these ST-PCs/ST-EOFs should be included in the reconstruction of that certain oscillatory mode. This is done by summing up the corresponding RCs. No information is lost in the reconstruction, and the original time series is a sum of all individual RCs.

Randomised algorithm for MSSA
As mentioned earlier, the computational burden of MSSA becomes soon prohibitively high if the original data set is high-dimensional and M is chosen to be large. This is typically the situation in studies of low-frequency variability in climate data sets. Traditionally, the dimensionality reduction has been obtained by calculating first a conventional PCA and retaining a set of dominant PCs for the MSSA (see chapter 2.2.3). However, in this article we apply a different approach to dimensionality reduction. That is, we use RP to reduce the dimensionality of the original data set before performing MSSA.
In Halko et al. (2011), it is stated that randomised methods provide a powerful tool for constructing approximate matrix factorisations. Compared with standard deterministic algorithms, the randomised methods are often faster and more robust. Halko et al. (2011) present also numerical evidence that these algorithms succeed for real computational problems.
2.2.1. Description of RMSSA. In our approach, RP is applied to reduce the dimension of the original data matrix X after which the traditional MSSA calculation is performed in the lower-dimensional subspace. Finally, we reconstruct the ST-EOFs and RCs in the original space. We call this algorithm randomised multichannel singular spectrum analysis (RMSSA).
In RP, the original data set is projected onto a matrix R of Gaussian distributed (zero mean and unit variance) random numbers in order to construct a lower-dimensional representation P of the data set: In other words, we are projecting our data set onto k random directions determined by the column vectors of R. From these projections a lower-dimensional representation of the original data set can be constructed. Due to the simplicity of RP, involving only matrix multiplication, it can be applied to a wide range of data sets, even very highdimensional ones. RP has already been applied to climate data in Seitola et al. (2014) and it has been shown to preserve structures of the original data very well. In that article, the theoretical background of RP is presented in more detail with additional references. The projected lower-dimensional data set P can be processed through MSSA where instead of original L channels we have now only k channels. This implies substantial computational savings (see Appendix A.2, algorithm 1). In the literature, there are some estimates of an appropriate value for k (e.g. Frankl and Maehara, 1988;Dasgupta and Gupta, 2003). However, these theoretical lower bounds for k are the worst case estimates and usually much lower values for k still give good results, retaining most of the information of the original data set (see e.g. Bingham and Mannila, 2001;Seitola et al., 2014). In practice, the value for k is usually chosen adaptively keeping the desired size for lowerdimensional approximation in mind.
A final step of the algorithm is to calculate the reconstructed components. This requires the recovery of the MSSA eigenmodes from the reduced space back to the original space, allowing the reconstruction of the original time series. This means that the eigenvectors (ST-EOFs) should be calculated in the original space instead of the reduced one. This part of the algorithm is also presented in Appendix A.2. Furthermore, in Appendix A.4 we explain how RP preserves the lagged covariance structure of the original data set.

2.2.2.
Comparison of RMSSA to previous work. To our knowledge, the proposed RMSSA algorithm is unique. Some published work comes close to our approach but RMSSA has some important differences to the randomised MSSA algorithms used in seismic data processing (Oropeza and Sacchi, 2011;Chiu, 2013). The aim of Chiu (2013) was to introduce a new rank-based-reduction denoising algorithm to perform coherent and random noise filtering concurrently. Chiu (2013) named this algorithm, or rather filter, MSSARD (MSSA in the randomised domain). In MSSARD, the randomising operator randomly rearranges the order of the input data and reorganises the coherent noise into incoherent noise. The most important difference to our algorithm is in the randomising operator: In our case, we are using RP to reduce the dimensionality of the input data whereas Chiu's (2013) approach is to randomly rearrange the input data.
The technique of Oropeza and Sacchi (2011) was to embed a spatial data at a given temporal frequency into a block Hankel matrix after which a randomised singular value decomposition (SVD) was adopted to accelerate the rank reduction stage of the algorithm. Construction of a Hankel matrix corresponds to the construction of an augmented data matrix A in our algorithm (see Appendix A.1). Our algorithm is different in the sense that we apply RP on the original input data before construction of the augmented (or Hankel) matrix. This notably reduces the computational burden of MSSA because we are processing a much smaller data set already in the augmentation phase of the algorithm (see algorithm 1 in Appendix A.2).
In addition to these main differences, the abovementioned seismological applications involve handling a data set where each time/frequency slice of spatial (x-y) data is processed separately through the algorithm. In our case, we are processing the whole timeÁlongitudeÁlatitude data set at once through the RMSSA algorithm.
2.2.3. Enhancing PC-based MSSA. In many studies, where MSSA is used as an analysis method (e.g. Plaut and Vautard, 1994;Moron et al., 2012), the dimension of the original data matrix has been reduced by calculating a conventional PCA of the original data matrix and then limiting MSSA into the dominant PCs. One has to bear in mind that the problem dimension may be prohibitive to contemplate solving even PCA, let alone MSSA. Nevertheless, the number of retained PCs is a somewhat arbitrary choice, but can be estimated by inspecting the eigenvalue spectrum and choosing the PCs that account for the majority of the variance and are separated from the rest of the spectrum. In geophysical datasets, however, the eigenvalue spectrum often decreases monotonically and it is difficult to distinguish the appropriate cut-off point. The aim of the study does also affect the choice of the PCs. For example, if the focus is on large-scale patterns, it might be more convenient to choose the low-frequency PCs for further analysis. Performing the calculations with different number of PCs and comparing the results can also help in finding the appropriate number of PCs. Importantly, RMSSA (Appendix A.2, algorithm 1) does not suffer from this problem because the lower-dimensional data set has essentially the same structure as the original high-dimensional data set.
PCA-based dimensionality reduction is, however, a preferred method if the oscillatory modes identified with MSSA are tested against a red noise null-hypothesis through Monte-Carlo simulation. According to Allen and Robertson (1996) the test is only useful if the channels in the data matrix are orthogonal or at least very nearly so. The PCs fulfil the orthogonality condition exactly. The randomised method can still accelerate Á and in the case of a very-highdimensional data set even enable Á the calculation of the PCs (see Appendix A.2, algorithm 2).
This also raises the question as to whether the projected data set [i.e. matrix P in eq. (1)] could be used directly in MC-MSSA. Like the PCs, RP is also an orthogonal projection and the columns of P are also nearly orthogonal. However, this question is beyond the scope of this study and will not be discussed here any further.

Data
As an illustration of applying the RMSSA algorithm, we analysed the monthly mean near-surface air temperature field from the 20th Century Reanalysis V2 data, hereafter 20CR, provided by the NOAA/OAR/ESRL PSD (Compo et al., 2011). In addition, we repeated the analysis for the historical 20th century simulations by Hadley Global Environment Model 2 Á Earth System HadGEM2-ES (Collins et al., 2011), hereafter HadGEM2, and MPI Earth System Model (ESM) running on a medium resolution grid MPI-ESM-MR (Stevens et al., 2013), hereafter MPI-ESM. We extended the historical simulations (1901Á2005) until 2012 using the rcp45 simulations. The historical and rcp45 simulations were extracted from the CMIP5 data archive and they follow the CMIP5 experimental protocol (Taylor et al., 2012). In the 20th century simulations, the historical record of climate forcing factors such as greenhouse gases, aerosols and natural forcings such as solar and volcanic changes is used. Rcp45 simulations follow the RCP4.5 greenhouse gas scenario. We used a single ensemble member of each model: r2i1p1 in case of HadGEM2 and r1i1p1 in case of MPI-ESM.
The 20CR data set is produced using an ensemble of perturbed reanalyses, and the final data set corresponds to the ensemble mean. Only surface pressure observations are assimilated, and the observed monthly sea-surface temperature and sea-ice distributions are used as boundary conditions to generate full three-dimensional estimates of the state of the troposphere (Compo et al., 2011). The 20CR data set is available from 1871 to 2012 but to be consistent with HadGEM2 and MPI-ESM simulations, the time sequence analysed here is 1901Á2012 (1344 time steps). 20CR has Â2.0 degree horizontal resolution and we have used Gaussian gridded (192 )94) data from 3-hour forecast values. HadGEM2 and MPI-ESM have both a global grid of 144)73 points. Thus, we have original data sets X N )L with N01344, L 018048 (20CR) and L010512 (HadGEM2 and MPI-ESM).
As an illustrative example of the high-dimensionality of the MSSA problem, let's choose a lag window of M0240 (months). In the case of the 20CR data set, this would result in an augmented matrix with M )L04331520 columns. Clearly some kind of dimensionality reduction is needed in order to make the computations more efficient or even make them possible.

Application of RMSSA to climatic data sets
In the previous section, we have introduced the RMSSA algorithm and the data sets to be analysed. Next we will proceed to the applications of the proposed method and discuss the results.
First, the original data sets were mean centred and RMSSA (algorithm 1 in Appendix A.2) was applied with k 0500.
The first 1Á30 ST-PCs of 20CR are shown in Fig. 2. In order to find the most powerful frequencies associated with the ST-PCs, the Multitaper spectral analysis method (Thomson, 1982;Mann and Lees, 1996)  The above analysis was also performed for the HadGEM2 and MPI-ESM data sets (figures not shown). As the annual cycle is too dominating in each data set, the analysis in the following sections will be repeated without the annual cycle. We also integrate a MC-MSSA step in the RMSSA algorithm (Appendix A.2, algorithm 2) in order to study the statistical significance of the obtained components.
3.1.1. Pre-processing the data for Monte-Carlo MSSA. Some pre-processing of the original data sets was crucial in order to assess the statistical significance of the lowfrequency variability using MC-MSSA. First of all, the original data sets were standardised (i.e. the time series of each grid point was mean centred and divided by its standard deviation) in order to avoid overweighting the grid points with higher variance. Furthermore, the annual cycle of the time series of each grid point was estimated by STL (Loess based Seasonal-Trend Decomposition) and removed from the original data set. The STL method is a filtering procedure for decomposing a time series into trend, seasonal and remainder components. It includes some parameter choices controlling, for instance, how rapidly the trend and seasonal components can change. The method is described in detail in Cleveland et al. (1990) and we have followed their guidelines in choosing the related parameters. Without this procedure the annual cycle would dominate the results and starve the lower ranked MSSA components of power when tested against the red noise null-hypothesis. Linear trends were also fitted and removed from the data sets in order to avoid the dominance of the centennial scale trend.
For the sake of comparison, the annual cycle was also estimated by calculating the mean values of each calendar month and those values were subtracted from the data to get monthly anomaly time series. However, determining the base for the anomaly calculation is not that straightforward and the choice of a base period may have severe impacts on the results (Kawale et al., 2011). Furthermore, the average annual cycle is only removed and if the annual cycle varies in the time series, the anomalies still contain a residual annual cycle.
The dimensions of the original data sets were reduced by applying RP with k 0500 to have a lower-dimensional approximation P N )k of each data set. To be able to perform MC-MSSA, we further calculated SVD of P and retained 30 first PCs of each data set, explaining approximately 72% (20CR), 67% (HadGEM2) and 64% (MPI-ESM) of the variance. Those 30 PCs were used as input channels in the MC-MSSA-step.
3.1.2. Decomposition of the pre-processed data sets. The ST-PCs 1Á30 of each data set and their spectra are presented in Figs. 3Á5. These figures show the results after applying the steps 1Á8 of algorithm 2 in Appendix A.2 (note that the annual cycle and linear trend were removed from the original data sets). In 20CR (Fig. 3), the ST-PCs 1Á2 are so-called trend components explaining together almost 9% of the variability of the data set. Pairs of

Identifying significant oscillations
In MC-MSSA step, in total of 1000 realisations of red-noise surrogates were generated and the red-noise basis was used to estimate the 90, 95 and 99% confidence intervals for the eigenvalues generated by the noise model that consists of independent first-order autoregressive processes. Figure 6 shows the results of the Monte-Carlo significance test of 20CR, HadGEM2 and MPI-ESM with a 20 yr lag window (M0240 months). In that figure, the data eigenvalues and 2.5th and 97.5th percentiles of the distribution of the surrogate eigenvalues are plotted against the dominant frequencies of the corresponding red-noise basis vectors (noise ST-EOFs). The dominant frequencies are estimated using fast Fourier transform (FFT). It should be noted, that the estimate of the dominant frequency of the noise ST-EOFs may not be exactly the same as the dominant frequency of the data ST-EOFs which may cause some small inaccuracies in the results. The significant signals (at 5% significance level) in Fig. 6 are those whose data eigenvalues lie above the 97.5th percentiles of the surrogate eigenvalues. According to the test these signals have more variance than would be expected to have from a noise process. According to Plaut and . Therefore we only show the significance test of the periodicities that are covered by the 20 yr lag window used in this example. From Fig. 6, we can see that in 20CR data set there are some significant periodicities (at 5% level) between 1.7 and 5.5 yr. HadGEM2 has somewhat more significant periodicities compared to 20CR, especially on 10 yr time scales, but MPI-ESM has hardly any eigenvalues lying above the 97.5th percentile.
3.2.1. Results with different lag window lengths. As noted earlier, the Monte-Carlo simulations were performed with varying lag window M to estimate its effect on the statistical significance of the oscillations. Spectral resolution increases with lag window length and oscillatory pairs with longer periodicity can be identified. However, at the same time the statistical significance of the identified oscillations may decline. We used the following values of M: 5 yr (M060 months), 10 yr (M0120), 20 yr (M0240), approx. 28 yr [M0340:N/4, following the recommendation of Elsner and Tsonis (1996)] and approx. 38 yr [M 0450:N/3, following Vautard and Ghil (1989)].
The identified periodicities and their significance levels with increasing lag window are presented in Fig. 7. The numbers in Fig. 7 show the dominant periods associated with the oscillations. These dominant periods are estimated using FFT. From Fig. 7 we can see that in 20CR the significant periodicities are consistently found at 3.6, 2.3 and 1.7 yr, depending to some extent on M. Those periods are more or less visible in HadGEM2 and to a lesser extent in MPI-ESM. Significant 5Á6 yr oscillations are identified in all the data sets and especially a Â5.5 yr variability is found consistently.
2Á6 yr oscillations are usually attributed to ENSO which is a globally dominating form of variability on annual to decadal time scales (e.g. Kleeman, 2008). It is a broadband phenomenon with several spectral peaks and the highest peak is around 4 yr. This can also be seen in our analysis of 20CR, HadGEM2 and MPI-ESM data sets because most of the significant oscillations are concentrated on 2Á6 yr time scales. However, the spectra of MPI-ESM (Fig. 5) differs distinctly from the spectra of the other two data sets: the MC-MSSA test of the monthly near-surface temperature variability in 20CR, HadGEM2 and MPI-ESM data sets 1901Á2012. PCs 1Á30 of RP ' PCA (see Appendix A.2, algorithm 2) are used as input channels in the analysis and the lag window length M is 20 yr (240 months). In MC-MSSA, the red-noise basis is used. Red squares show the data eigenvalues plotted against the dominant frequency of the ST-PC corresponding to each eigenvalue. The vertical bars show the 2.5th and 97.5th percentiles of the eigenvalue distribution calculated from 1000 realisations of the red-noise surrogates. The ST-PCs that correspond to eigenvalues rising above the 97.5th percentiles are considered significant at the 5% level. Note the missing power at 1 yr due to the removal of the annual cycle. power on 2Á8 yr time scales is distributed on a large set of components (especially ST-PCs 4Á18) which also decreases the statistical significance of oscillations on those time scales.
In HadGEM2, significant decadal scale oscillations are identified with all lag window lengths. Dominant peak on the decadal time scales has been noted by Collins et al. (2008) and one of the possible reasons for this is in deficiencies of simulation of the ENSO phase-changing process in HadGEM2 (Martin et al., 2010).
There are also significant multi-decadal components in 20CR data set, but their period decreases with increasing lag window M. The time series to be analysed become  1 1.1 1.1 1.1 1.1   1.3 1.3 1.3 1.8  shorter with increasing M and this may have an effect on the identified period length. We did not find significant multi-decadal components in HadGEM2 and MPI-ESM data sets, although 27 yr and 26 yr periods are identified on 10% significance levels, but only with a single lag window length. However, the use of a lag window M typically allows only the distinction of oscillations with periods 5M and thus the interpretation of those multi-decadal components remains uncertain.

Reconstruction of the significant oscillations
The final step of our analysis is to reconstruct the decomposed signals in the original space. As an illustration, we have chosen to reconstruct the signal corresponding to approximately 5.5 yr variability, which was identified in all the data sets. In order to see the time evolution of the Â5.5 yr variability, we have reconstructed the time series in each gridpoint of the original data set with the ST-PCs corresponding to the signal of interest. I.e., in the reconstruction we have projected the original (centred) data set onto ST-PCs (calculated in the reduced space) to obtain ST-EOFs in the original space and then projected the ST-PCs onto those ST-EOFs (see Appendices A.2 and A.4 for more details). In order to see the global effects of the Â5.5 yr cycle, the time series of each grid point has its original variance. The above calculations were completed for each data set using their own Â5.5 yr patterns. ST-PCs 5 and 6 of the 20CR data set (Fig. 3), ST-PCs 6 and 7 of HadGEM2 (Fig. 4) and ST-PCs 4 and 5 of MPI-ESM (Fig. 5) were used in the reconstruction.
Once we have reconstructed the time series in each gridpoint we can plot the anomalies related to the signal month by month. These plots are presented as animations of each data set (20CR, HadGEM2 and MPI-ESM) for a time period of 1901Á2012 (the animations are available at www.youtube.com/channel/UCRjwc6cI-TzbvtShONYZ7cg). In Fig. 8, we also show the global patterns of the Â5.5 yr variability of near-surface temperature anomaly. The patterns are composites of eight cases, when the oscillation is in its positive phase in the equatorial Pacific. Positive events are defined as an average of winter months (NovemberÁMarch).
The temperature anomalies of 20CR have many similarities to global El Nin˜o effects, such as above average temperatures in the central and eastern equatorial Pacific Ocean, in the western and northern parts of North-America and South-America as well as in South-East Asia, Australia and southern Africa. Below average anomalies are found in the south-east parts of North-America, in the north-west and south-west Pacific as well as in northern parts of Eurasia. In 20CR, a typical north-south wave train is also seen, but the east-west patterns are weaker, except for the anomalies at the Amazonas.
HadGEM2 and MPI-ESM show similarities to 20CR, but differences can be seen, for example, in the Pacific forcing patterns. Especially in MPI-ESM the centre of the forcing pattern seems to be more western. In the model simulations, the negative anomaly near the west-coast of North-America extends to the continent, which is not detected in 20CR. The positive anomalies in HadGEM2 and MPI-ESM also extend into the northern Eurasia and there are anomaly patterns in the southern Indian Ocean which are absent in 20CR. MPI-ESM has a stronger positive anomaly in the coast of South-East Asia compared to the other two data sets. In addition, there is a strong anomaly near the Antarctic Peninsula in the Weddel Sea in the 20CR data set which is not detected in the model simulations. The anomaly patterns in the Atlantic Ocean are also weaker in 20CR compared to simulations.
The animations of the Â5.5 yr pattern (available at www.youtube.com/channel/UCRjwc6cI-TzbvtShONYZ7cg) show some more features in addition to the ones seen in Fig. 8. For instance, in 20CR animation there is a quite strong anomaly pattern to the west of Ural Mountains. This pattern is not usually associated with ENSO, and its maximum negative and positive phases seem to occur at different times compared to the ENSO-related anomaly patterns in the Pacific. However, this pattern to the west of Ural might also reflect some other phenomenon, mixed with the ENSO patterns.
The animations also show that the variability has a more propagating character in 20CR data set whereas the anomaly patterns in the model simulations are more stationary. In the northern and southern Pacific Ocean, for example, the anomalies seem to propagate eastward in the 20CR animation.
Compared to 20CR, HadGEM2 and MPI-ESM show a richer structure in Fig. 8 and in the animations. One has to remember that the reanalysis data set is an ensemble mean whereas the analysis of the climate model simulations is conducted on a single ensemble member of each model. This may also contribute to the structure seen in the model simulations. Different, more or less real, phenomena may also be mixed in the variability patterns of the simulations.

Summary and Discussion
We have introduced an RMSSA algorithm, which allows the calculation of MSSA of extremely high-dimensional problems. The RMSSA algorithm first reduces the dimension of the original data set by RP, then decomposes the data set into components of different frequencies by calculating MSSA in a reduced space, and finally reconstructs the components in the original high-dimensional space.
We have applied the RMSSA algorithm to decompose the monthly mean near-surface air temperature of the 20th century reanalysis and the historical 20th century simulations of HadGEM2-ES and MPI-ESM-MR extracted from the CMIP5 data archives. We have also performed Monte-Carlo simulations in order to estimate the significance of the identified low-frequency components. Our analysis shows that 2Á6 yr oscillations are present in all the data sets. Their statistical significance is highest in HadGEM2 while in MPI-ESM the power on those timescales is distributed on a large set of components decreasing their statistical significance.
2Á6 yr oscillations are usually attributed to ENSO which is a globally dominating form of variability on annual to decadal time scales. Our global monthly animations of 5Á6 yr near-surface temperature cycle match quite well with the known temperature anomalies related to ENSO. The reanalysis and the historical simulations have similar anomaly patterns in the central and eastern Pacific Ocean, around the northern part of Indian Ocean as well as in the north-west North-America, but also some notable differences in several areas, such as Eurasia. Also, our animations of the 5Á6 yr cycle reveal a propagating structure in the near-surface temperature anomalies of 20CR, while the variability in HadGEM2 and MPI-ESM data sets is more stationary. The focus of this study was to introduce the RMSSA algorithm and the discussion on the possible causes for the differences in oscillatory patterns of the data sets is limited. However, this would be a subject for a further study with a larger set of climate model data sets included.
RMSSA algorithm is a powerful tool when the dimensions of the data sets become prohibitively large. It allows a computationally efficient way of decomposing a data set into its spatio-temporal patterns. Several climatic state variables can be incorporated in the RMSSA at the same time in order to find the co-varying signals and illustrate their propagation. RMSSA can also be used in studying the oscillations in three dimensions including data from several atmospheric levels in the analysis.

Acknowledgements
We thank Petteri Uotila and Mikko Alestalo from Finnish Meteorological Institute for interesting discussions at earlier phases of the work. We also thank Jouni Ra¨isa¨nen from University of Helsinki for preparing the HadGEM2-ES and MPI-ESM-MR data sets used in this study. In addition, we thank the reviewer for her/his valuable comments and suggestions that helped in improving the manuscript. This research has been funded by the Academy of Finland (project number 140771).
The aim of MSSA is to identify spatially and temporally coherent patterns in a multivariate data set. In MSSA terminology, the columns of the original data matrix X N )L are called channels. In case of gridded data set, N represents the time steps and L is the number of grid points: . . . The next step is to construct an augmented data matrix A, which contains M lagged copies of each channel in X: . . .
x N0;i x N0þ1;i Á Á Á x N;i 2 6 6 6 4 3 7 7 7 5 ; i ¼ 1:::L (A2) and In MSSA, M represents the lag window. A has now ML columns and N 0 ¼ N À M þ 1 rows. The singular value decomposition (SVD) of A can now be calculated: The vectors of U A are the eigenvectors of Z A ¼ 1 ML AA T and V T A contains the eigenvectors of C A ¼ 1 N0 A T A. These vectors are orthogonal and often called space-time principal components (ST-PCs) and space-time empirical orthogonal functions (ST-EOFs), respectively. Diagonal elements of D A are the eigenvalues of C A or Z A .
Optionally the lag-covariance matrix C A (or Z A ) and its eigendecomposition can be calculated to yield eigenvectors V T A (or U A ) and eigenvalues (diagonal elements of matrix , it is more convenient to estimate C A (or Z A ) because it is smaller. See Allen and Robertson (1996) for details.

A.1.2. Monte-Carlo MSSA
The components obtained by MSSA can be tested against a null-hypothesis of the data being generated by independent AR(1) processes (i.e. red noise). The red noise model has the form: u tþ1;s ¼ c s u t;s þ a s w t;s ; (A5) where g s is the lag-1 autocorrelation of channel s (in the original data set), a s ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi c s ð1 À c 2 s Þ p (c s is the variance of channel s) and W t,s is Gaussian white noise. The data set generated by the model in (A5) is called the surrogate data set and it is subjected to MSSA in the same way as the original data set. Large number of surrogates are generated in order to estimate the confidence limits for the MSSA results of the original data set.
In the test of Allen and Robertson (1996), the lagcovariance matrices of the original data set and the rednoise surrogates are projected either onto the data-adaptive basis (i.e. U A or V T A ) or the null-hypothesis basis. The nullhypothesis basis can be calculated from the expected lagcovariance matrix C N of the red-noise surrogates. C N can be estimated analytically by Projection onto the red-noise basis is considered more reliable because the use of the data-adaptive basis assumes the existence of an oscillation even in a case where it is uncertain whether or not the oscillation is significant.
According to Allen and Robertson (1996), the input channels should be uncorrelated (or at least nearly) at zero lag for the test to be useful. In the case of a gridded data set, where all the grid point time series are used as input channels, the decorrelation condition is not valid. The test might still be useful if we are using grid points sufficiently far from each other as the input channels for the test (Ghil et al., 2002).
A.2. Randomised algorithms for MSSA 1: Original MSSA algorithm enhanced by RP (1) construct the original data matrix X N )L (2) (pre-processing of X, if needed) (3) generate k L-dimensional vectors of Gaussian distributed random numbers to get matrix R (and optionally orthogonalise the random vectors) (4) project the original data matrix onto random vectors: P NÂk ¼ 1 ffiffi k p X NÂL R LÂk (5) generate augmented matrix A RP of P (6) calculate SVD: A RP ¼ U RP D 1=2 RP V T RP (or covariance matrix C RP or Z RP and its eigendecomposition) (7) calculate ST-EOFs in the original space: V A : A T U RP ðD 1=2 RP Þ À1 (see Appendix A.4 for an explanation) (8) calculate RCs using ST-EOFs of step 7.

2: PC-based MSSA algorithm enhanced by RP
(1) construct the original data matrix X N )L (2) (pre-processing of X, if needed) (3) generate k L-dimensional vectors of Gaussian distributed random numbers to get matrix R (and optionally orthogonalise the random vectors) (4) project the original data matrix onto random vectors: P NÂk ¼ 1 ffiffi k p X NÂL R LÂk (5) calculate SVD of P (see Appendix A.3 for an explanation of how the covariance is preserved in RP ' SVD) (6) retain e.g. 30 first PCs of P to obtain reduced matrix T (7) generate augmented matrix A PC of T (8) calculate SVD: PC V T PC (or covariance matrix C PC or Z PC and its eigendecomposition) (9) (MC-MSSA step) (10) calculate ST-EOFs in the original space: V A : A T U PC ðD 1=2 PC Þ À1 (see Appendix A.4 for an explanation) (11) calculate RCs using ST-EOFs of step 10.

A.3. RP and SVD
The method to back-project from the reduced space to the original space in the case of RP ' SVD is explained in Seitola et al. (2014) (Appendix A.1) but we also present it briefly here: The SVD of the original data matrix X N )L is: U contains the eigenvectors of Z 0XX T . Random projection (RP) of X is P 0XR, where R L)k is the projection matrix and the row vectors of R are scaled to have unit length. Thus, we can write: In the previous, we have assumed that RR T % I because the row vectors of R are nearly orthonormal. It is also possible to make the vectors of R strictly orthonormal, in which case RR T ¼ I. However, orthogonalisation is often not necessary, because the difference between the orthogona-lised and non-orthogonalised random vectors is very small, especially in high-dimensions. Let's rewrite (A7) as X NÂL ¼ U NÂr D rÂr V T rÂL , where r 0rank(X). Now we can manipulate (A7): Because Z :Z RP we can approximate In the previous, we have defined U RP as N )k and D RP as a k )k matrix, where k is the rank of matrix P N )k .