Surrogate data for hypothesis testing of physical systems

The availability of time series of the evolution of the properties of physical systems is increasing, stimulating the development of many novel methods for the extraction of information about their behaviour over time, including whether or not they arise from deterministic or stochastic dynamical systems. Surrogate data testing is an essential part of many of these methods, as it enables robust statistical evaluations to ensure that the results observed are not obtained by chance, but are a true characteristic of the underlying system. The surrogate data technique is based on the comparison of a particular property of the data (a discriminating statistic) with the distribution of the same property calculated in a set of constructed signals (surrogates) whichmatch the original data set but do not possess the property that is being tested. Fourier transform based surrogates remain the most popular, yet many more options have since been developed to test increasingly varied null hypotheses while characterizing the dynamics of complex systems, including uncorrelated and correlated noise, coupling between systems, and synchronization. Here, we provide a detailed overview of a wide range of surrogate types, discuss their practical applications and demonstrate their use in both numerically simulated and real experimental systems. We also compare the performance of various surrogate types for the detection of nonlinearity, synchronization and coherence, coupling strength between systems, and the nature of coupling. A MatLab toolbox for many of the surrogate methods is provided. © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Motivation
The analysis of time series originating from dynamical systems often requires estimation of the value of some measure to characterize the behaviour of the underlying system. The natural variability of dynamical systems and the methods used to observe and analyse them means that without appropriate statistical testing, we cannot know the reliability of a single value obtained using these measures. Ideally, we would estimate the value of the same measure on a large number of realizations of the system dynamics, and obtain an accurate estimate from the distribution of values. The increasing availability of powerful computing facilities has made this possible for numerical simulations [1]. However, limited data availability in real experimental scenarios rules out this approach. When working with these limited data sets, confidence intervals can be estimated using the technique of bootstrapping [2], or specific hypotheses can be tested using another method known as surrogate data [3].
The method of surrogate data for hypothesis testing grew in popularity following the work of Theiler et al. [3], where it was presented as a test for nonlinearity. In this case, it involves creating multiple versions of an original time series with the same statistical properties, but which can only be described by a linear model, i.e. with any nonlinearities removed. When a nonlinear measure, or discriminating statistic, is applied to the original time series and the range of surrogates, if the result from the time series deviates significantly from the surrogate distribution, then we can conclude that the time series is unlikely to have a linear origin. The concept of surrogate data testing has now been expanded and used to test not only for nonlinearity [4], but more specifically for chaos [5], for synchronization between two or many systems [6], for coupling in nonlinear systems [7], and for oscillations in noisy signals [8], to name just a few of the wide range of applications.
Surrogate testing has formed the basis of new discoveries in many fields of research, including astrophysics [9][10][11], geophysics [12], applied mathematics [13], engineering [14], climate studies [15,16], ecology [17,18], finance [19,20], digital signal processing [21], neuroscience [22][23][24] and biomedicine [25][26][27][28]. The need for surrogate data testing arises from the increasing availability of time series from a wide variety of dynamical systems, i.e. systems for which their evolution in time can be described by a function, for exampleẋ = f (x). In order to understand the underlying dynamics of the systems from which we record these time series, we ideally need to somehow determine the function f , which will describe the physical principles of the system. There are two possible approaches. The first approach attempts to fit pre-existing models to the data, and checks how closely the data fit to the model. Another group of methods, the so-called inverse approach, seeks to learn from the data, with the observed characteristics then shaping the description of the dynamics [29,30]. This approach involves the extraction of information about the properties of the time series, how they evolve in time, and how they interact with other systems. Surrogate testing is invaluable in the validation of the results obtained in both cases.
One area that surrogates have proven to be very useful is in biological systems. The application of time series analysis methods to physiological data sets is growing in popularity, as new theories and models to describe the dynamics of living systems are proposed. For example, the human body comprises a huge number of interacting dynamical systems [23,[31][32][33], all of which change their characteristics over time in order to meet energy demands and thrive in their environment [34]. Therefore, signals recorded from these systems can appear to be very complex, hard to characterize, and very difficult to repeat exactly. It can also be quite difficult in some of these systems to obtain sufficiently long recordings for the application of certain analysis methods. It is these properties that make the application of surrogate hypothesis testing to biomedical data so appealing, ensuring that conclusions are sufficiently robust to provide new information about these systems, rather than discounting them as uninformative fluctuations or noise.
Biomedical data may take many forms. Electroencephalography (EEG) provides information about the electrical activity of the neurons in the brain. Electrocardiography (ECG) does the same for the pacemaker cells of the heart. The characteristics of the circulating blood, such as pressure, velocity and oxygenation can all be measured noninvasively, and can be seen to continuously fluctuate as they are influenced by adjacent dynamical systems. These are just a few of many possibilities; any parameter which can be measured from the systems of the body over time can provide insights into their functions. Simultaneously measured signals will provide yet further opportunities. The surrogate data approach has provided huge amounts of new information, for example on the nature of brain behaviour in health and disease [22,[35][36][37], cardiorespiratory interactions [38] and neurophysiological mechanisms of cardiovascular regulation [39].
Although an essential tool for biomedical and neurological studies, surrogate data testing is just as applicable to any signal arising from any physical system. Surrogates aid in the robust characterization of the physical laws and behaviours of these systems, to determine the function of any dynamical units that may make up the system, how they influence each other, and how they are affected by other systems, including higher-dimensional environments that may act as noise. They answer questions about the underlying system such as whether the data are nonlinear, or whether they contain significant oscillations, by providing examples of what the results of a particular analysis would look like if these properties were definitely not present. Answering these fundamental questions using surrogate data opens the door for further analysis on the origin of these observed effects. If more than one data set is available, we can also ask further questions, such as whether their underlying systems are synchronized, and, if so, what the nature of the coupling between them is. Considering the physics of these systems is the key to understanding their past and future evolution, and surrogate data testing is a crucial tool to ensure the reliability of the results.

Outline
The development of many surrogate types over the years means that it may be difficult to know which is the optimal type to use when designing a study and analysing results. The aim of this paper is to provide a clear, up-to-date introduction to the topic and present scenarios in which surrogates have been used to great effect.
The first part of the review describes in detail the most popular surrogates in use today, for testing for nonlinearity and testing for interdependence, and provides examples, algorithms and applications. Examples of some lesser known surrogates which may be promising for future applications are also discussed.
The second part of the paper presents a comparison of the performance of different surrogate types for the detection of nonlinearity, the detection of synchronization in two coupled oscillators, both simple phase oscillators and chaotic oscillators, and detection of coupling in simulated oscillators and real experimental data using dynamical Bayesian inference.
We also provide step-by-step algorithms and a MatLab toolbox for the implementation of many of the surrogates that are discussed.

What are surrogates and why do we need them?
Before we begin to describe the wide range of surrogates that have been developed, we will discuss a well known example in which their use is crucial. Consider a research study in which a time series has been recorded from a system, and we have no prior information about the underlying dynamics. The overall aim of the study is to determine whether the system in question is chaotic. Popular measures, or discriminating statistics, for whether a system is chaotic include the correlation dimension [40] and Lyapunov exponents [41], although the implementation of the latter can be difficult in finite noisy data sets. Applying both measures to the data will yield finite results, supposedly providing information on whether the system is chaotic or not, and we may reach a conclusion based on this analysis. However, when applying these techniques to data that appear complex, one implicitly assumes that the system dynamics comes from deterministic nonlinearity rather than stochasticity. It follows that if the system is in fact linear, the results of these tests are unlikely to be pointing to chaos in the system. It is clear that before nonlinear techniques can be applied, the presence of nonlinearity must first be established. This question is the focus of a large portion of the available surrogate methods.
Hence, to proceed the question must be simplified: Do the data demonstrate evidence of nonlinearity? To answer, we formulate a null hypothesis to test against, i.e. the case that remains if the answer to our question is no, that the data can be fully described by a linear system. Once we have established the correct null hypothesis for our purpose, the goal is then to create a large number of surrogate data sets which correspond to this null hypothesis. In this case, each surrogate should be a random copy of the system, with all linear properties maintained, but with any nonlinearity destroyed. The discriminating statistics can then be applied to the original data, and every surrogate data set. It is crucial in any surrogate application to treat the original data and each surrogate in exactly the same way, including any preprocessing, filtering or embedding, and to ensure that properties such as sampling frequency and data length are the same in the surrogates as in the original data. The results are then compared, and if the result in the original data is significantly different to the results obtained for the surrogates the null hypothesis can be rejected with a confidence level based on the statistical threshold chosen. A result that is not significantly different to the surrogate distribution means that we cannot consider the data as having a nonlinear origin, or the test used is insufficient to reveal any nonlinearity present in the system. As chaos is necessarily a nonlinear phenomenon, except in the special case of infinite dimensional systems where linear chaos can be observed, we cannot make any statistically significant conclusions about chaos in the system.

Typical vs. constrained realizations
Within the surrogate family, there are two approaches: typical realizations and constrained realizations. Typical realizations fit a model to the data, for example by estimating the coefficients of an autoregressive (AR) model and then producing Monte Carlo realizations of this model to compare with the data (see Section 4.4). This has the disadvantage that the user must know the model and its parameters beforehand.
In contrast, constrained realizations are calculated directly from the data, producing surrogates that match all properties of the original data other than the property being tested. The differences between these two approaches have been thoroughly discussed by Theiler & Prichard [42]. Constrained realizations are more widely used, as they can be used with almost all discriminating statistics, whilst the applicability of the typical realization approach is dependent on whether the discriminating statistic is ''pivotal'', i.e. whether its distribution is the same for all processes consistent with the null hypothesis [42]. Whilst the constrained realization approach is more popular, being what is usually referred to as surrogate data, it should be noted that the constrained realization approach is designed only for hypothesis testing, and thus, in contrast to typical realizations, cannot be used for the estimation of confidence intervals [42].

The null hypothesis
It is very important when using surrogate data to formulate the correct null hypothesis. When rejecting a null hypothesis, we can say that this underlying hypothesis is unlikely to describe the origin of the data [43]. On the contrary, failure to reject the null hypothesis does not necessarily mean that the null hypothesis is true. All surrogates test only, and exactly, the null hypothesis specified for them. Therefore, it is extremely important to choose the correct surrogate for the purpose. It is also important, if possible, to test generated surrogates before use to ensure that they are suitable, i.e. if they reject the null hypothesis only in expected scenarios, and not just as a result of the inaccuracy or inapplicability of the surrogates, or because there are not enough data available to reach a reliable conclusion about the dynamical property in question. For example, Paluš [44,45] discusses the importance of identifying problems in surrogate data by using linear and nonlinear redundancies. If, when testing for nonlinearity, linear redundancy is rejected in the linear surrogates, then there is likely a problem in the surrogate data which may result in spurious nonlinear statistics.

Significance testing
In many applications of surrogates, it has been assumed that the distribution of the discriminating statistics calculated in a set of surrogates is Gaussian, and thus the significance level has been set as a number of standard deviations above or below the mean, usually two. However, the distribution of these surrogates is not always Gaussian, and so this assumption should be checked before taking this approach. Alternatively, and more robustly, the nonparametric rank-order test can be used, in which the significance level is determined as, for example, the 95th percentile of the surrogates [3,4].
The number of surrogates required will depend on whether a one or two-sided test is to be performed. A one-sided test means that the null hypothesis can be rejected only if the distribution of the surrogate values of the discriminating statistic deviates from those calculated in the original data in one specified direction, i.e. it is only lower or only higher. For example, deterministic signals should have better predictability than noise, thus, constructed surrogates that are realizations of noise will indicate rejection only if the original data are more predictable than most of the surrogates, whilst no conclusion is drawn if they are less predictable. A two-sided test allows testing for the possibility of the discriminating statistic being either higher or lower in the surrogates than in the original data.
The number of surrogates required will also depend heavily on the characteristics of the data and the specific surrogates being used. A good starting point is to use 100 surrogates, and test how much the spread of values of the discriminating statistic varies with increasing numbers of surrogates. If the range of surrogate statistics is still increasing when reaching 100, then more will be required. For some applications, it may not be necessary to calculate 100 surrogates in order to calculate the significance level. If computational effort is a limiting factor, and the spread of the surrogate data is not too large, for a one sided test, M = K /α − 1 surrogates can be generated, where K is a positive integer and α is the probability of false rejection corresponding to a level of significance (1 − α) × 100%. For a two-sided test, M = 2K /α − 1 surrogates should be generated. Therefore, if K = 1, 19 or 39 surrogate time series are required for one-and two-sided tests, respectively, at a significance level of 95% [4]. In this case, the maximum/minimum of the surrogates should be taken as the significance level, rather than setting a percentile threshold. However, these assumptions should be carefully tested before use, and are unlikely to be sufficient in many applications.
If multiple statistical tests are performed, it becomes increasingly likely that significant results will be obtained by chance. Therefore, in these cases, the significance level must be updated accordingly to prevent false rejections of the null hypothesis (type 1 error). The Bonferroni correction can be used to compensate for this by altering the overall significance level to α/m where α is the overall significance level and m is the number of hypotheses. In the context of surrogates, this would alter the percentile above which the result is considered significant. The examples presented in the current work each include one test only, therefore this correction has not been applied.

Testing against noise
The very first question when embarking on the analysis of a signal, from what we suspect is a dynamical system, is usually whether the dynamics of the system results in purely random noise, or exhibits some deterministic features. This is a question which is not trivial, especially in real systems where we can regularly face a very low signal-to-noise ratio. Surrogate methods have been employed to provide answers to this question, but the assumptions made about the underlying null hypothesis are extremely important as they will significantly affect the outcome. If the background noise against which the data are tested is not estimated correctly, false rejections of the null hypothesis are likely to occur. At the same time, when testing against noise, it is important to remember that whilst rejection of the null hypothesis means that the data are unlikely to be the particular type of noise against which we are testing, a failure to reject does not mean that the data are definitely just noise. While this is a possibility, it is also possible that deterministic dynamics is present in the data, but cannot be observed with the applied methods, for example due to a low signal-to-noise ratio, or the use of an inappropriate discriminating statistic. Keeping this in mind, we discuss various methods with null hypotheses of either white (uncorrelated) or coloured (correlated) noise. The methods described here mostly take the typical realization approach, i.e. a model is estimated, and many realizations of this model are generated in order to test the deviation of the discriminating statistics calculated for the original data from the distribution of those calculated for the surrogates.

Uncorrelated noise
When considering the complexity and time-variability of the signals arising from dynamical systems, it is clear that it is very unlikely that the data will be perfectly random noise, and it may be very easy to reject this null hypothesis. However, random surrogates can still be very useful for initial testing of whether a data set is suitable for further analysis.

White noise surrogates
Arguably the simplest surrogate method, white Gaussian noise surrogates can be used to estimate significance levels when signal filtering is used. White Gaussian noise signals are generated with the same mean and standard deviation as the data to be analysed, and identical analysis is applied. They test the null hypothesis that the data are uncorrelated Gaussian noise by providing a threshold of discriminating statistics that would be obtained if the data were purely white Gaussian noise. Bandrivskyy et al. [46] used white Gaussian noise signals to demonstrate the low frequency bias in wavelet phase coherence calculations. Wavelet phase coherence was calculated between skin temperature and blood flow, revealing two frequency ranges apparently involved in temperature regulation. 30 pairs of uncorrelated white Gaussian noise signals were used to demonstrate the bias in wavelet phase coherence, resulting from the finite length of the data sets used, and to test the null hypothesis that there is no phase coherence between skin temperature and blood flow.

Random permutation (RP) surrogates
To check the simple case of whether there is any temporal structure in the data, or the data are just uncorrelated noise, random permutation (RP) surrogates can be used. RP surrogates possess the same mean, variance, and histogram distribution as the original signal, but any temporal structure is destroyed [25]. These surrogates are also known as independent identically distributed (IID), scrambled or shuffled surrogates.
Null hypothesis. The observed data is fully described by independent and identically distributed (IID) random variables [3].
Algorithm. Randomly shuffle the original time series.
Issues. An example of a simulated oscillatory signal and its RP surrogate is shown in Fig. 1. Rejection of the null hypothesis in this case means that the data possess temporal structure, which includes correlated noise. Any correlations present in the data will lead to this rejection. In this case, just by comparing the Fourier transforms, it is clear that the difference in power spectra between the original signal and the surrogate warrants the rejection of the null hypothesis, i.e. there is some temporal structure in the data. This is the only conclusion which can be made from this test, we cannot conclude anything further about the nature of this structure without further tests. Whilst in the majority of cases data can usually be verified as not being uncorrelated noise by visual inspection, RP surrogates can be very useful in the further calculation of discriminating statistics which are strongly affected by variations in parameters, for example to check if the length of a time series is too short, causing almost any signal to have the same value of a particular discriminating statistic.
Applications. RP surrogates are more widely applicable than white noise surrogates, as they allow for a non-Gaussian distribution. Lachaux et al. [47] presented for the first time a practical method for the quantification of frequency specific synchronization between two neuroelectric signals (phase locking statistics (PLS)). RP surrogates were used to validate synchrony values against background fluctuations [47]. Kenwright et al. [48] used RP surrogates to investigate the effect of low-frequency oscillations on cardiorespiratory synchronization during rest and exercise. 100 shuffled instantaneous cardiac and respiratory frequency surrogates were calculated for each subject in order to test them for cardiorespiratory synchronization. This revealed higher order, shorter duration cardiorespiratory synchronization during exercise than that found at rest [48].

Correlated noise
In many real world applications, it is highly unlikely that the background noise in recorded data is purely uncorrelated noise, and thus RP and white Gaussian noise surrogates cannot be expected to provide much new information. The study of geophysics [12] and climate [15,16] are fields in which the null hypothesis of a red or Brownian noise (with power spectral density of the form 1/f 2 ) background is more appropriate. One example is the persistence of the weather, i.e. if one day is sunny and warm, the next day will have a high tendency of being similar, which results in a red noise background in the power spectra of long-time meteorological records [49]. Many real biological systems are difficult to distinguish from Ornstein-Uhlenbeck processes [50], thus surrogates are crucial when searching for dynamics in observations resembling them [51].
While the form of the background spectra alone is an important property of the system, failure to take it into account may hinder the search for any other, less obvious dynamics in the data. The presence of strong background noise in data may lead to the conclusion that there are no interesting features within this data beyond its power spectrum, without full consideration of the possibility that less obvious dynamics may be buried.
It is very important to consider that the 'noise' itself may simply be a manifestation of one or more complex processes, for example multiple interacting dynamical systems, and should be fully understood before being considered inherent to the system. Various manifestations of correlated noise can be studied in their own right using surrogates. If 1/f type behaviour is observed, it can be tested to see if it represents a monofractal or multifractal process [52], demonstrating that the noise should not be completely dismissed. The properties of noise are also crucial in chronotaxic systems, which are dynamical systems that possess a moving point attractor [53,54]. The presence of this point attractor can be identified in real data by characterizing the phase fluctuations, or noise, in the system, with more uncorrelated fluctuations indicating stronger chronotaxicity [55,56].
Once the background noise has been characterized and explained, the search can then widen to try to extract information about other dynamics that may be observed in the system.

Surrogates for nonlinearity testing
The surrogate method for testing for nonlinearity gained huge popularity following the work of Theiler et al. [3]. The surrogates reviewed in this section were developed in response to the need to test for nonlinearity in data, once it was emphasized that the application of nonlinear analysis methods must be justified by first finding nonlinearity in the time series [57]. Calculation of discriminating statistics meant to infer information from nonlinear systems on a linear system may still provide a finite result, but it may not make much sense. As discussed above, white noise or RP surrogates present an inadequate test for any system with correlated variables, and thus new surrogates are required which maintain the same serial correlations, i.e. the same autocorrelation function as the original data, but the nonlinearity is removed, providing the null hypothesis of a linear system. It has been shown to be more useful to test for a specific nonlinear property, than to test for nonlinearity generally [45], though the presence or absence of nonlinearity already provides valuable information about the underlying system.
The following surrogate types (with the exception of autoregressive surrogates) are designed to test the null hypothesis that the data were generated by a noisy linear process, and can be fully characterized by linear characteristics, namely mean, standard deviation, and autocorrelation for all lags. The Wiener-Khinchin theorem ( [58], explained in detail in [14]) demonstrates that autocorrelation can be preserved by preserving the power spectrum, which is implemented in the majority of the surrogates in this section. Preserving the power spectrum while randomizing the Fourier phases of the data provides a surrogate in which any nonlinear structure is destroyed.
Here, we describe the most commonly used surrogate types for testing for nonlinearity in data: Fourier transform (FT), amplitude adjusted Fourier transform (AAFT) and iterative amplitude adjusted Fourier transform (IAAFT) surrogates. FT, AAFT and IAAFT have been extensively studied and used in a wide range of applications [3,4,[59][60][61]. Therefore, they are relatively safe to use, as all of their pitfalls have been studied in detail. The reader is encouraged to consult [4] for an excellent review. Although these surrogates are very well known and widely used, it is still extremely important to choose the correct surrogate depending on the application.
We also consider another Fourier based surrogate method introduced by Dolan et al. [62,63] named iterative digitally filtered surrogates (IDFS), which deliberately do not preserve the power spectrum exactly for all surrogate realizations, in order to allow for the naturally expected variation in a system. Wavelet based surrogates for nonlinearity testing are also discussed.
One area where these surrogates for nonlinearity testing have been of great importance is neuroscience, where they have been used to scrutinize some previous conclusions of nonlinearity and chaos in brain signals [64]. Soon after the introduction of the concept of chaos, it became extremely popular as a possible explanation for previously seemingly unexplainable behaviour. For the first time, it provided a solution in which a system can be completely deterministic, yet appear random, due to its extremely sensitive dependence on initial conditions. It was a particularly popular concept in the analysis of EEG data, which was concluded to be chaotic in many studies, usually using correlation dimension and Lyapunov exponents as discriminating statistics. Further investigation and increased scrutiny using newly developed surrogate data testing [65] revealed that these conclusions may have been premature [66], with researchers struggling in some cases to consistently find even nonlinearity in normal EEG time series. However, nonlinear time series analysis methods were shown to be crucial in the identification of the focal hemisphere in epilepsy patients, with a much greater accuracy achieved with nonlinear methods combined with surrogates, than in either the application of linear methods, or the application of nonlinear methods without surrogates [67]. This was also shown to be the case for bivariate analysis techniques for the investigation of interdependence [36].
Whilst chaos cannot be directly proven in a time series [68], nonlinear dynamics can. In the vast majority of cases, the absence of nonlinear dynamics means that chaos can be excluded. However, it is more difficult to look specifically for chaos in a time series which has been shown to be nonlinear using surrogate data. An irregular time series arising from a nonlinear deterministic system is likely in most cases to be either chaotic or pseudo-periodic (defined as a representative of a periodic orbit perturbed by dynamical noise, contaminated with observational noise, or a combination of both, with states within one cycle being largely independent of those within previous cycles (see Section 5.4)) [69]. Thus being able to exclude the possibility that the time series is pseudo-periodic would mean that the time series is more likely to be chaotic, or vice versa.
As it is generally difficult to distinguish between the two scenarios, various surrogate data tests with null hypotheses beyond nonlinearity alone have been developed for this purpose [69].

Autoregressive moving average (ARMA) processes
Rejection of the null hypothesis using the surrogates in this section and some nonlinearity measure suggests that the data possess nonlinear structure. If the null hypothesis cannot be rejected, the data are either linear, i.e. generated by a linear set of equations driven by noise, or the chosen test parameter is not sufficient to reveal the nonlinearity in the data. This could be because the nonlinearity is weak, or the discriminating statistic is testing for the wrong type of nonlinearity. For the discussed FT based surrogates (FT, AAFT and IAAFT) the underlying process is assumed to be ''stationary linear Gaussian''. Considering each word separately, it is clear that this test for nonlinearity, whilst rejecting the null hypothesis for a nonlinear signal, would also reject the null hypothesis if the underlying process was linear Gaussian but nonstationary or nonautonomous, linear but non-Gaussian, or any combination of these. Failure to reject the null hypothesis does not necessarily mean that the underlying process is stationary linear Gaussian. This highlights the importance of also considering the effects of nonstationarity and time-variability in the data before assuming nonlinearity. It should also be kept in mind that underlying dynamics can be masked by noise or the use of incorrect discriminating statistics, or the time series may not fully represent the system being studied.
When considering how strict these conditions are, the most important question is: what is a stationary linear Gaussian process? The fundamental question when using these surrogates is whether the system demonstrates nonlinear phenomena or not, i.e. must it be described by nonlinear equations, or can it also be described by linear equations with additive noise? This is what we want to test for.
In discrete time, the most general stationary Gaussian model with finite correlation time is the autoregressive moving average (ARMA(M,N)) model [70]: where η i are independent Gaussian increments with zero mean and unit variance, and a i , b i are coefficients. It should be noted that the ARMA model does not include all possibilities of linear equations, and as a result the corresponding surrogate tests may reject the null hypothesis for linear processes of the form (1) but with: 1. Time dependent parameters (nonautonomous systems). However, this can be regarded as nonlinearity. 2. Any (including Gaussian) multiplicative noise in parameters. 3. Types of noise which cannot be represented as uncorrelated Gaussian or correlated noise with a fixed timescale.

Nonstationarity
Surrogates that test against the null hypothesis of a linear, Gaussian, stationary stochastic process may not only provide a rejection in the case of a nonlinear system, but also if the system is linear but nonstationary. This was demonstrated by Timmer [71] using the correlation dimension and Fourier transform surrogates. It is therefore very important to ensure that the data are indeed stationary, i.e. their statistical properties are invariant, before applying any surrogate technique with this assumption in their null hypothesis. Borgnat and Flandrin [72] discuss in detail how to utilize the stationarization which is a natural property of Fourier transform surrogates, and how to test for nonstationarity using surrogates.
Another approach to the problem is to update the null hypothesis and the surrogate algorithm to match any nonstationarity in the data so that it cannot be the source of significant differences in the result. The method proposed by Lucio et al. [73] achieves this by detrending and retrending the data and applying a technique introduced in by Nakamura et al. in [74]. The authors presented a method for investigating nonlinearity in irregular fluctuations, or the short-term variability of time series even if the data exhibit long-term trends [74]. The method presented, named the truncated Fourier transform surrogate (TFTS) method, is an extension of FT surrogates in which a frequency threshold is determined, above which the phases of the original signal are randomized and below which they remain the same. This randomizes the short-term variability in a time series whilst preserving long-term trends. As a surrogate method based on the Fourier transform, the accuracy of this method will suffer with the presence of end-to-end mismatch of the data (see Section 4.3), so the authors propose to symmetrize the data to correct this before applying the algorithm, resulting in symmetrized TFTS (STFTS) [74]. It should be noted that the method discussed in [73] is only applicable when it can be assumed that the nonlinearity in the static transformation in AAFT and IAAFT is weak, as it is very difficult to separate simultaneously occurring nonstationarity and nonlinearity.
Wavelet based surrogates, which preserve the local mean and variance of the data, were introduced by Keylock [75,76]. The fact that wavelet analysis does not assume stationarity makes it particularly useful in the study of ecological time series [17]. Wavelet surrogates have since been used directly to test for stationarity [77]. Wavelet surrogates are discussed further in Section 5.1.
It is emphasized that windowing is usually preferable when dealing with nonstationary time series. Surrogate tests can be performed for different windows of data, and nonlinearity could even be tested at different scales to find out exactly at which times and scales nonlinearity can be found. In contrast, for stationary signals as long a signal as possible is optimal, to ensure transient interactions are not missed [78].

Preprocessing
Once possible nonstationarity in the data has been addressed, the data should be preprocessed to avoid spurious results introduced solely by the surrogate generation method. The preprocessing should be performed in the following order: 1. Remove any trends in the data which are not important or relevant to the results, so that they are not incorporated into the surrogates. 2. Correct any mismatch between start and end points and corresponding first derivatives. This is very important and must not be overlooked, and arises from the assumptions made when calculating any of the Fourier transform based surrogates. 3. Subtract the mean to ensure that results are not affected by significantly different mean values.
The goal of most FT-based surrogates is to preserve the autocorrelation of the signal, and preserving the Fourier spectrum provides a way to achieve this easily. The Wiener-Khinchin theorem states that there is one-to-one correspondence between Fourier amplitudes and periodic autocorrelation. As the latter suggests, this correspondence is only exact when the data can be considered as containing an integer number of periods of oscillation. If this is not the case, then the assumption of correspondence breaks down, and autocorrelation is no longer preserved. If there is a finite mismatch between the beginning and end of the data, spurious results will be obtained [4], especially in periodic signals. Fig. 2 demonstrates this using a sine wave and its Fourier transform surrogate. A 'beating' phenomenon can be seen in the surrogate calculated from the nonpreprocessed signal. This is a purely numerical artifact and is due to the separation of the dominant frequency peak into other frequency bins. This demonstrates the importance of preprocessing, as it is highly likely that this behaviour would cause the rejection of the null hypothesis of a linear process.
To prevent problems arising from the issues described above, the start and end points of the data should be matched as closely as possible. The first derivatives should also be matched, to ensure that non-integer number of cycles are not permitted, which would alter the power spectrum and again cause the system to appear as nonlinear. The matching of start and end points can be achieved by choosing the closest pair of values from the first K 1 and last K 2 points, and truncating data before and after these points, respectively. For a periodic signal, matching the first derivatives is also important, and can be achieved by not only matching points, but minimizing the mismatch between consecutive points at the beginning and the end of the signal as follows: where k 1 = 1 . . . K 1 , k 2 = L − K 2 − p . . . L − p are indices of points at the beginning and end. Then truncate the time series up to k start at the beginning and from k end to the end, so the new ''processed'' time series will be s n=kstart ...k end . The variable p determines smoothing, and should be greater for noisy systems. Stam et al. [68] reported that p = 10 is appropriate in most cases. However, this value will depend on the signal, including its shape, the period of oscillations, the sampling frequency, and the presence of noise. It is recommended that p be chosen as a portion of the period of the oscillation, from T /10 to T /2, depending on the data. The latter case would be used for example if one had a square wave, as a smaller value may mean matching at any point along a flat part of the signal.
This truncation process and the surrogates that it generates have been considered as surrogates in their own right, as so-called length-adjusted discrete Fourier transform (LA-DFT) surrogates [68], but preprocessing should be considered an essential step for all FT based surrogates, and also time-shifted surrogates (see Section 7.4). One interesting case in which the effects of not preprocessing are obvious is a sine wave. If the ends of a sine wave are not matched, surrogate testing coupled with a discriminating statistic may consider it as nonlinear, whilst if preprocessing is performed correctly it will be regarded as linear.

Autoregressive (AR) surrogates
Autoregressive surrogates are used to estimate the background noise spectrum of the observed data, and are an example of typical realization surrogates. An autoregressive process is a discrete representation of a random process. It assumes that a variable depends linearly on its previous values and a stochastic term, where x t is the variable, ρ i are parameters of the model, c is a constant (assumed to be zero for simplicity), ε t is a random variable from a Gaussian distribution with zero mean and unit variance, and σ is the standard deviation of the Gaussian distribution. Higher-order AR processes can actually support oscillations themselves, thus great care must be taken when selecting a null hypothesis and during parameter estimation.
Null hypothesis. The observed data can be fully described by an autoregressive noise model.
Algorithm. In the case of the assumption of AR(1) noise, surrogates can be simulated by estimating the correlation coefficient ρ from the time series using lag-0 and lag-1 covariances. The model (3) is then simulated with this estimated value for each surrogate.
Issues. As with any surrogate method, to obtain meaningful results, the null hypothesis must be carefully considered. As previously discussed, it has been shown that a large proportion of climate data has a red noise (Brownian noise, 1/f 2 ) background. Red noise can be perfectly described by the AR(1) model with ρ = 1. A comparison of frequency spectra for different values of ρ is given in Fig. 3(A). It becomes clear that although red noise can be well represented by this model, it may be more difficult to estimate the spectrum of signals for which a value of ρ other than 1 or 0 is estimated from the data.
How well the AR(1) model matches the background noise of the signal may depend upon the number of points in the signal, with a plateau appearing in the spectra at lower frequencies (see Fig. 3(A)).
One example of background noise which cannot be estimated using the AR(1) model is pink noise (1/f ), with energy that is inversely proportional to the frequency of the signal. Pink noise-like spectra have been observed in many physical and biological systems [79], and may be considered when searching for dynamics in noisy data arising from nature. More generally, the noise spectrum may be described by 1/f α where α = 1 for pink noise, 2 for red noise and 0 for white noise. Fig. 3(B) shows the 1/f α spectra for different values of α. Surrogates can also be generated from this model, if this better fits the characteristics of the data. When using this model for the underlying surrogates, the parameter α will need to be estimated from the data [80]. Examples of the use of AR and 1/f surrogates are shown in Section 9.1.3.

Fourier transform (FT) surrogates
Fourier transform (FT) surrogates use a phase randomization process to test for nonlinear structure in a time series. Despite some drawbacks, discussed below, they are widely used, and computationally very cheap.  Null hypothesis. The data were generated by a stationary linear Gaussian process. Equivalently, the null hypothesis can be formulated as that the data do not contain any nonlinear structure. The surrogates satisfy the null hypothesis if they have the same power spectrum as the original data.
Algorithm. This procedure is known as phase randomization, and serves to preserve linear behaviour, i.e. the power spectrum/autocorrelation, but destroy any nonlinear behaviour. An example FT surrogate of a noisy sinusoid is shown in Fig. 4.

Issues.
As with all surrogates, it should be noted that if the data do not meet the criteria of the whole null hypothesis, it will be rejected. For example, in the case of FT surrogates, the null hypothesis, in addition to being rejected if the data are indeed nonlinear, will also be rejected if the data are linear but cannot be described by an ARMA process, i.e. the process is linear but nonstationary/nonautonomous, the data are linear but non-Gaussian, or are linear but contain any multiplicative noise.
As discussed in Section 4.3, the restraints of the Wiener-Khinchin theorem requires that data is truncated to only include an integer number of periods before application of the FT algorithm. This also applies to the amplitude adjusted Fourier transform (AAFT) and iterative amplitude adjusted Fourier transform (IAAFT) surrogates discussed below. It should also be noted that FT surrogates tend to indicate nonlinearity for linear time series with long coherence times, which is thoroughly discussed in [59].
Applications. Contrary to studies that have failed to find significant nonlinearity in the electroencephalogram (EEG), Stam et al. [35] found evidence for nonlinearity in normal and abnormal EEG during the awake/eyes closed state, and conclude that it is the spatial structure in the EEG that exhibits much of the nonlinear structure. The nonlinear EEG measures presented show more or less specific patterns of dysfunction in dementia and Parkinson's disease. The surrogates used in this case were FT surrogates with linear correlations maintained [35] (see below for further discussion on multivariate surrogates). The null hypothesis for the surrogates used was that all structures in the EEG, in the time and in the spatial domain, can be adequately described by a linear model.
Using FT surrogates with linear redundancy, Paluš [22] found that there is nonlinearity in vigilant and sleep EEG recordings, but that it was not consistent with the hypothesis of low-dimensional chaos, as assumed many times previously before the use of surrogate data. Faes et al. [25] compared surrogate types, including IID, FT and AR, using simulations and real cardiovascular data, and found that when testing coherence significance levels, surrogates which preserve the autocorrelation in the data (FT and AR) are superior to those which only preserve the amplitude distribution (IID). FT surrogates have also been used in synchronization analysis between HRV and EEG in epilepsy [81]. Yamamoto et al. [82] found evidence of deterministic chaos in HRV at extreme altitude based on surrogate testing for nonlinearity with correlation dimension and FT surrogates.
Multivariate FT surrogates. Fourier transform surrogates, and any other surrogate type in which the phases are randomized (i.e. AAFT, IAAFT), can be used for the investigation of discriminating statistics and nonlinear dependencies in multivariate data by preserving the cross spectrum between signals in addition to preserving their power spectrum [60]. This is achieved by adding the same random phases to each FT surrogate. The corresponding null hypothesis would then be that all signals are stationary linear Gaussian processes, with only linear dependence between them. Equivalently, the null hypothesis is that the data are a realization of a multivariate Gaussian process. Rombouts et al. [83] used multivariate surrogates to investigate nonlinearity in EEG signals. Breakspear and Terry [84] combined AAFT and multivariate surrogate data techniques to search for nonlinear interdependence in EEG data.
Multivariate surrogates have been used to test for nonlinear dependence between signals, but care must be taken, as the null hypothesis requires that the signals are linear. Therefore, a test with these surrogates may reject the null hypothesis in the case of linearly coupled nonlinear systems, providing a misleading result. This means that these surrogates are not suitable for testing the nature of coupling, where no prior information is available.

Amplitude adjusted Fourier transform (AAFT) surrogates
FT surrogates usually have a Gaussian distribution of values, which is often not the case in real data, which can then lead to false rejections of the null hypothesis purely based on differences in amplitude distribution. Care must be taken to ensure that a conclusion of nonlinearity does not arise through the presence of a static nonlinearity, i.e. introduced into the system during measurement and therefore not a result of the underlying dynamics of the system [6]. To counter this, a histogram transformation can be applied, e.g. to transform the data into a Gaussian distribution [44,85], or the distribution of the surrogates can be adjusted to match the original time series, as in the amplitude adjusted Fourier transform surrogates described below [3]. These surrogates are also known as histogram adjusted isospectral (HAFT) surrogates [86].
Null hypothesis. The data represent a rescaled linear Gaussian process, which means that the data were generated by a stationary linear Gaussian process, measured by an invertible time-independent instantaneous (i.e. no time delays) measurement function s(·): where the restriction that h(·) is invertible (i.e. x n = h −1 (s n )), time-independent (stationary/autonomous) and instantaneous is very severe and essential [4].
Algorithm. In AAFT surrogates, as well as trying to preserve the power spectrum as in FT surrogates, the amplitude distribution is also preserved. First we should find the possible underlying process x n from the measured data s n , phase randomize it, and transform back to recover the original distribution of s n . Since the underlying linear process x n should have a Gaussian distribution of values, to find it one should rescale the original values s n to conform with a Gaussian distribution. This is usually done by rank ordering, which attempts to empirically invert the measurement function. The procedure for AAFT surrogates is as follows [3]: 4. Calculate the FT surrogate of x (as described in Section 4.5), denoted xs. 5. Sort xs in ascending order, and again assign a rank to these values in a new vector ranks2. 6. Reorder the sorted values of the original signal according to ranks2 to obtain the surrogate.
An example AAFT surrogate of a noisy sinusoid is shown in Fig. 5.
Issues. Kugiumtzis [87] highlighted the importance of ensuring that surrogate data are suitable for the application before use, by demonstrating some problems with the AAFT algorithm. It is stressed that AAFT surrogates are only applicable if the static transform h is invertible. The invertibility, i.e. a clear one-to-one correspondence between the underlying process x n and the observed data s n , is especially crucial. This restricts the types of possible rescalings which can be tested by AAFT surrogates. An example of the performance of AAFT surrogates when the data are subject to an invertible ((s n ) 3 ) vs. a noninvertible ((s n ) 2 ) transformation is given in Section 9.1.1.
AAFT surrogates do not perfectly preserve all specified constraints, as the amplitude adjustment part of the algorithm introduces some changes in the power spectrum. Consequently, the surrogates no longer perfectly match the original time series. This is shown in Figs. 5 and 9(A), where there is strange behaviour in the surrogate time series and the spectrum clearly does not match that of the original signal, with spurious harmonics being introduced. This is due to the fact that AAFT surrogates assume Gaussian distribution of the underlying process, whilst the sinusoids have highly non-Gaussian distributions. Thus, AAFT surrogates will regard sinusoids as a nonlinear process using most discriminating statistics. The corruption of the power spectrum makes AAFT surrogates inappropriate for testing any type of background frequency content introduced by the calculation of a discriminating statistic. Even when all requirements are met, AAFT surrogates still have a 'whitened' power spectrum when compared to the original time series due to the finite number of samples.
Applications. AAFT surrogates have been used in many applications. Paluš & Novotná [86] investigated amplitude frequency correlations (AFC) in sunspot index data using 150,000 FT and AAFT surrogates leading to evidence of a nonlinear oscillator underlying the dynamics of the sunspot cycle. Bernjak et al. [88] used wavelet phase coherence combined with AAFT surrogates to investigate common fluctuations between simultaneously recorded microvascular blood flow and blood oxygen saturation signals, finding significant coherence between these parameters in shallow skin recording locations but not in deeper recordings. The same methods were also used to investigate recordings of sympathetic nerve activity with two needles inserted into nerve tracts supplying skin and muscles, showing continuously varying and coherently coupled human skin and muscle sympathetic nerve oscillations over time [89].
Musizza et al. [90] investigated EEG signals in rats during deep and shallow anaesthesia. By combining the method of conditional mutual information [91] and AAFT surrogates, they found that the power of individual brain waves, as well as brain connectivity, clearly differs between deep and shallow anaesthesia, and between different types of anaesthetic. Stankovski et al. [39] combined time-frequency methods and voluntary ramped-frequency breathing to explore human neurophysiological mechanisms by calculating wavelet phase coherence and directionality of coupling, testing their significance with AAFT surrogates. is not perfectly preserved. The power spectrum (D) is perfectly preserved. In IAAFT-1 surrogates (not shown), the amplitude distribution is extremely close to the original and can be regarded statistically the same, but the power spectrum may differ slightly. In all panels, black lines represent the original signal and orange lines represent the surrogate.
Kvandal et al. [92] also applied wavelet phase coherence analysis to intracranial pressure (ICP) and arterial blood pressure (ABP) signals with significance levels determined by AAFT surrogates. They found that a phase shift between the ICP and ABP signals in the interval 0.07-0.14 Hz indicates normal cerebrovascular activity, while a phase shift in the interval 0.006-0.07 Hz indicates an altered cerebrovascular reactivity. Sheppard et al. [93] also used AAFT surrogates in the analysis on ICP and also tested for time-localized coherence. Ehlers et al. [94] investigated the effects of ethanol on the brain using AAFT and nonlinearity measures.

Iterative amplitude adjusted Fourier transform (IAAFT) surrogates
Iterative amplitude adjusted Fourier transform (IAAFT) surrogates were introduced by Schreiber & Schmitz [61] to overcome the flatness bias observed in the power spectrum of AAFT surrogates. They are constructed by iterative replacement of Fourier amplitudes with the correct values and rescaling the distribution to achieve a closer match between both the distribution and the power spectrum in the original data and the surrogates.
Null hypothesis. The data represent a stationary linear Gaussian process, measured through an invertible, time-independent instantaneous measurement function.
Algorithm. The power spectrum and the amplitude distribution of the original time series should be preserved in the surrogates. The iterative nature of the algorithm for IAAFT surrogates increases the computation time depending on the number of iterations required for convergence.
1. Generate a random shuffle of the data, y (0) n .

Fourier transform the current iteration y
n ) for all n. This indicates that the procedure has converged, and that the power spectrum and distribution are as close to the original data as possible. 5. As the final surrogate signal, either take the last z n of step 2 to exactly preserve the power spectrum (IAAFT-2), or the last iteration of step 3 to exactly preserve the amplitude distribution (IAAFT-1).
An example IAAFT-2 surrogate of a noisy sinusoid is shown in Fig. 6.
Issues. The iteration involved in the IAAFT algorithm cannot preserve both the amplitude distribution and the power spectrum exactly. Therefore, the user must decide which is most important for the application. IAAFT surrogates in which the amplitude distribution is preserved exactly are known as IAAFT-1 surrogates, whilst those in which the power spectrum is perfectly preserved are known as IAAFT-2. The latter are recommended in cases where specific features of the amplitude distribution are not being studied, due to false rejections which may occur as a result of autocorrelation not being preserved [87]. Typically, IAAFT-2 surrogates are used due to the fact that most discriminating statistics for nonlinearity  Fig. 4. The Fourier transform phases were plotted against their delayed versions, as detailed in [95], in this case with a delay of 4. It is clearly shown that rather than having no phase correlations and being completely random, as in the FT surrogate, the AAFT and IAAFT surrogates contain some correlations in phase, thus indicating the presence of nonlinearity.
depend on the spectrum. Therefore, in all further discussions we will use IAAFT-2 surrogates, and refer to them only as IAAFT.
IAAFT surrogates have a very small spread of values due to the very accurate preservation of the specified constraints, which should be carefully considered during testing as very small deviations may affect the outcome of the test. This leads to a slightly lower power of the test when compared to AAFT, which means it may more often fail to reject the null hypothesis when it is false, but the number of false rejections of the null hypothesis is much smaller than for AAFT [61,87], which is a great advantage. The small spread also has the advantage that less surrogates are required to determine significance levels.
The lower power of the test has also been attributed to possible phase correlations introduced during the adjustment and iteration process. The original creators of the IAAFT algorithm warn that the constraints introduced mean that ''it cannot be excluded that the iterative scheme is able to converge only by also adjusting the phases of the Fourier transform in a nontrivial way'' [61]. Indeed, Räth et al. have since clearly shown phase correlations in both AAFT and IAAFT surrogates when there should be none [95]. An example of these phase correlations is shown in Fig. 7. This introduction of nonlinearity by the algorithm could lead to an incorrect failure to reject the null hypothesis of a linear Gaussian process in data from a nonlinear system, purely because both the data and the surrogates contain nonlinearities. This important issue should be kept in mind when deciding which surrogates are optimal.
It should be noted that IAAFT surrogates often perform well even in the case when h(·) is noninvertible, whereas AAFT completely fail in this case [87]. This is a huge advantage over AAFT, as in real data there is no way to distinguish between invertible and noninvertible transformations.
Applications. IAAFT surrogates were used with a nonlinear prediction statistic to investigate the dynamics of the human alpha rhythm [96]. Nonlinearity was detected in only 1.25% of alpha rhythm time series. The authors highlight the variations in results of previous studies when looking for nonlinearity in EEG.
Andrzejak et al. [97] combined the nonlinear prediction error and an estimate of an effective correlation dimension with the method of IAAFT surrogates to compare dynamical properties of brain activity from different recording regions and from different physiological and pathological brain states. The strongest indications of nonlinear deterministic dynamics were found during seizure activity.
In astrophysics, Gan et al. [10] tested solar radiation time series for nonlinearity using kurtosis of the residuals following analysis of deviations from the best linear approximation of the time series and IAAFT surrogates. They found that the 5-min, hourly and daily global solar radiation time series exhibit nonlinearity while the monthly time series does not. IAAFT surrogates have also been used in the study of self-similar processes [13].

Iterative digitally filtered shuffled (IDFS) surrogates
The problem with FT and IAAFT surrogates is that they exactly preserve the power spectrum of the original signal by design. Therefore, the null hypothesis must be considered very carefully. If the particular statistic used is sensitive to variations in the power spectrum, then clearly any surrogate that does not allow for variations in the power spectrum is unsuitable, as the variance of the statistic would be much smaller for the surrogates than in true repetitions of the original measured process.
Iterative digitally filtered shuffled (IDFS) surrogates stand out as a possible improvement to IAAFT, due to the constraint that IAAFT surrogates preserve the spectrum of a single realization of the process exactly. This can lead to a very small spread of values of discriminating statistics. In reality we would expect some variation in the power spectra of repeated processes [62,63]. AAFT surrogates naturally have some variation in their power spectrum, but this variation can be  unpredictable, and thus cannot be considered as being equivalent to the variations that would be seen for independent realizations of the system. The IDFS method introduces a controlled variation into the power spectrum by creating each surrogate from statistically independent permutations of the original data filtered with a response function to maintain spectral properties. An example IDFS surrogate is shown in Fig. 8. The power spectra of 100 AAFT, IAAFT and IDFS surrogates are compared in Fig. 9.
Null hypothesis. The data are the result of a linear stochastic process, possibly influenced by a nonlinear measurement transformation.
Algorithm. The aim of IDFS is to produce surrogates with the same amplitude distribution as the original data but with a variance in the spectra of the surrogate population consistent with independent realizations of the underlying process.
1. Using overlapping windowed Fourier transforms (see Appendix A.5.1), estimate the power spectrum of the original time series x n and find the average power spectrum S(f ). 2. Calculate the inverse Fourier transform of the square root of the power spectrum √ (S(f )) to give a response function R(f ). 3. Create a random permutation of the data x n (i.e. a RP surrogate, see Section 3.1.2), denoted y n . 4. Convolve the random permutation y n with the response function R(f ). According to the convolution theorem, the Fourier transform of the convolution is equal to the product of the individual Fourier transforms [98]. Thus to calculate the convolution C n , first calculate the Fourier transforms of R(f ) and y n , then calculate the inverse Fourier transform of their product.
5. Rescale C n back to the original data distribution (in the same way as in the AAFT surrogate algorithm), to produce a signal z n with the same distribution as the original time series x n but a slightly different power spectrum. 6. To correct the power spectrum, Fourier transform z n , replace Fourier amplitudes (but not phases) with those from the Fourier transform of C n and inverse Fourier transform to obtain a signal with the same power spectrum as C n . 7. Similar to the IAAFT procedure, iterate steps 5 & 6 until the power spectrum and distribution are as close as possible to those of C n and x n , respectively.
Issues. The rescaling in step 5 introduces a whitening of the spectrum, as also seen in AAFT surrogates. Therefore, these surrogates are iterated to bring the distribution closer to the correct values, similar to the procedure used in IAAFT surrogates.
In the case of IDFS surrogates, the Fourier amplitudes should be repeatedly adjusted and rescaled. Note that, the Fourier amplitudes to adjust to are not those of the original data, but of the surrogate just before the first rescaling [62]. The IDFS algorithm has a computational cost and complexity similar to that of IAAFT surrogates.
Applications. The non-iterated version of these surrogates (DFS) were compared with IAAFT in [14] for the assessment of structural damage combined with the discriminating statistics of nonlinear prediction error (NPE), mutual information and bicoherence. Similar results were observed when comparing DFS, IAAFT-1 and IAAFT-2 for NPE and bicoherence, but for mutual information all surrogates performed poorly, with IAAFT-1 being the most successful of the three [14].

Comparing FT based surrogates
The first surrogates to be discussed, random permutation (RP), Fourier transform (FT) and amplitude adjusted Fourier transform (AAFT) are also known as algorithms 0, 1 and 2, respectively. There are many reviews that go into further detail, discussing and comparing these 'traditional' surrogate types [3,4,45,[99][100][101][102]. Generally, it is concluded that IAAFT are superior to AAFT for most applications [61,103], due to the lack of conservation of the autocorrelation in the latter, but it should be emphasized that no surrogate method should be blindly applied without careful consideration of the system being studied and the assumed null hypothesis [85]. Attention should also be paid to differences in methodology in these comparisons. When considering the various investigations have been done to compare AAFT and IAAFT, care must be taken in their interpretation. If IAAFT with exact distribution rather than exact spectrum were used, or the data were not preprocessed, this may lead to completely different results (see discussion in Section 4.3).
There have been many other variations on FT surrogates than those discussed here. Statistically transformed autoregressive process (STAP) surrogates aim to tackle the problem of possible noninvertibility of the measurement function. They were first introduced by Kugiumtzis as corrected AAFT (CAAFT) surrogates in [104] and then generalized and renamed to STAP [100]. Comparisons with AAFT, IAAFT and STAP are presented in [105]. The author concludes that STAP surrogates, being a typical realization approach, have less power, i.e. may be more conservative, than the constrained realization approaches of AAFT and IAAFT when not using a pivotal test statistic [105]. However, this difference in methods gets smaller as the time series length increases.

Surrogates for specific nonlinearity testing
Fourier transform based surrogates are still the most widely used for the detection of nonlinearity in a time series. However, in cases in which their null hypotheses (i.e. nonlinearly scaled linearly filtered noise) can be easily rejected, we may wish to make the null hypothesis more specific. We can then begin to ask questions about the nature of these oscillations themselves, such as whether the individual cycles are correlated, i.e. whether there is any long term determinism, or whether there is any correlation between different frequency scales.

Wavelet iterative amplitude adjusted Fourier transform (WIAAFT) surrogates
The wavelet transform is a time-frequency domain analysis method that is ideal for characterizing systems which have time-varying properties. Wavelet surrogates rely on the use of the inverse wavelet transform following some manipulation of the wavelet transform coefficients.
Most of the existing wavelet surrogates are based on the discrete wavelet transform (DWT). The DWT is a method which decomposes a signal into coefficients at different scales. The advantage of the DWT for this application is that the original signal can be reconstructed exactly from the resultant wavelet coefficients [106]. However, the DWT usually downsamples the data by a factor of 2 for each scale, thus the maximum overlap discrete wavelet transform (MODWT) is used, which ensures that each scale is represented by the same number of coefficients. A comprehensive introduction to the discrete wavelet transform and the MODWT used in many of these techniques is provided in [107], and is accompanied by a MatLab toolbox.
Breakspear et al. [108] introduced an alternative technique to randomizing phases in the Fourier domain to create surrogates, which exploits the fact that wavelets can better represent spatio-temporal patterns than Fourier modes. The technique is based on the resampling of wavelet coefficients, and exploits correlations between scales that exist within nonlinear data but which are absent or weak in stochastic data. The performance is comparable to phase randomization in Fig. 10. Calculation steps for a wavelet iterative amplitude adjusted Fourier transform (WIAAFT) surrogate using the method described in [75]. The first step is decomposition of the preprocessed time-varying noisy sinusoid sin(2π (t + 0.5 sin(2π t/10))/3) shown in (A) using the maximal overlap discrete wavelet transform (MODWT). The decomposed scales are shown in (C). An IAAFT surrogate is calculated for the coefficients at each scale, and then the surrogate or its mirror image is matched as closely as possible to the original coefficients. The resulting coefficients are shown in (D). Finally, the inverse wavelet transform is calculated on the new coefficients and the original scaling function (not shown), giving the WIAAFT surrogate shown in (B).
terms of the preservation of linear properties, removal of nonlinear structure, and, importantly, computational demands. Three different coefficient resampling techniques were considered: (1) shuffling of wavelet coefficients within each scale, (2) cyclic rotation of coefficients within each scale (time-shifting), and (3) block resampling of wavelet coefficients within each scale, in which blocks of coefficients are shuffled in time. The latter two were introduced in [108] for the first time, whilst the first was used in [109].
However, Keylock [75] argues that the randomization of the coefficients on the time axis employed in the previous method [108] provides no clear advantage over IAAFT for univariate time series. This problem was addressed by applying the IAAFT to each level of MODWT coefficients, to ensure that their power spectrum and amplitude distribution are retained. This method was named the wavelet iterative amplitude adjusted Fourier transform (WIAAFT) method [75], and is described below.
Null hypothesis. The data were generated by a linear Gaussian stochastic process, possibly with a nonlinear measurement transformation. Stationarity is not assumed.

Algorithm.
1. Calculate the maximal overlap discrete wavelet transform (MODWT) of the time series x n . This will separate x n into multiple times series of coefficients (as shown in Fig. 10(C)). 2. For each scale, calculate an IAAFT surrogate of the wavelet coefficients. 3. Take the IAAFT surrogate produced at each level, and compute its mirror image. Compare both the IAAFT surrogate and the mirror image with the original coefficients, and ascertain which version most closely matches the original coefficients using a least squares algorithm. The most closely matching time series becomes the surrogate coefficients for the current scale. An example of the resulting surrogate coefficients is shown in Fig. 10(D). 4. Calculate the inverse MODWT of the matched surrogate coefficients. This will reconstruct a surrogate time series (see Fig. 10(B)) from the new coefficients. 5. Repeat steps 2-4 for each surrogate required.
Once the inverse MODWT is calculated from these surrogate coefficients, a signal is produced that 'looks like' the original data in terms of its temporal evolution. However, like other surrogate methods, this process alters the distribution of the  values in the resulting time series when compared to the original data. Thus, an amplitude adjustment step, identical to that used in AAFT surrogates, is required. In turn, this then degrades the power spectrum, and thus the final iterative part of the IAAFT algorithm is applied to recover the correct values. An example WIAAFT surrogate is shown in Fig. 11. WIAAFT surrogates are compared to IAAFT surrogates in Fig. 12.
Issues. For long data sets, the WIAAFT algorithm is much more computationally expensive than Breakspear's shuffling technique, and also than IAAFT alone, as it has to be performed multiple times per surrogate. The most computationally intensive step is the matching of the surrogate coefficients to the original data in step 3. However, WIAAFT surrogates do provide an advantage in any application in which the nonstationarity in the data is required to be preserved, while the nonlinear aspects of the signal are randomized. In an extension to this work [75], Keylock introduced further constraints into the WIAAFT method [76], providing further discussion about difficulties with the IAAFT method when dealing with pseudo-periodic data, and the nonstationarity that is introduced, as discussed in [19]. Keylock suggests that pseudo-periodic surrogates (PPS) (see Section 5.4) are not fully constrained, and introduces Pinned Wavelet Iterative Amplitude Adjusted Fourier Transform (PWIAAFT) surrogates to address this [76]. Rather than using the optimal matching procedure in [75], this method introduces a threshold for the wavelet coefficients, above which they are ''pinned'' in place during the generation of the surrogate data. This results in the surrogates being approximately aligned with the data, and should significantly reduce computation time. Using this method reduces the number of possible surrogate realizations, to a degree which depends on the threshold, the choice of which is discussed in [76]. In the examples presented in this paper, we use WIAAFT surrogates rather than PWIAAFT to avoid the issues introduced by the choice of this threshold.
Applications. WIAAFT can be used to test hypotheses which cannot be addressed with standard IAAFT. The example given in [75] is the changing Hurst exponent of fractional Brownian motion, and it is suggested to be useful in any application in which one is concerned with changes in the properties of a signal through time. These methods also form the basis of the gradual wavelet reconstruction (GWR) technique, also introduced by Keylock [110,111], which allows systematic generation of surrogates with an increasing proportion of the properties of the original signal.
Other wavelet surrogates. Gur et al. [112] recently introduced a wavelet based surrogate generation method comprising wavelet transforms and sampling statistically equivalent realizations, and applied it to the simulation of chemical reactions. Wavelet surrogates based on the continuous wavelet transform have also been used to test for stationarity in data [77]. The use of the CWT is possible in this case because inverse transformation is not required [106]. The performance of wavelet surrogates in the identification of coupling in wavelet bispectrum analysis is investigated in Section 9.3.3.

Multifractal surrogates
The discrete wavelet transform has also been used as the basis of multifractal surrogates, which aim to enable testing of nonlinear interdependence within, with, between or among time series obtained from multifractal processes [113]. Multifractal surrogates preserve the multifractal properties of input data, i.e. interactions among scales and nonlinear dependence structures [113]. This approach was developed further by Keylock [114] and named iterated amplitude adjusted wavelet transform (IAAWT) surrogates. IAAWT surrogates are designed to preserve both, the multifractal characteristics of the time series and also the distribution of values in the time series using the iterative approach used in IAAFT surrogates. Multifractal surrogates have been used to test cross-scale interactions in atmospheric dynamics [115].

Cycle shuffled surrogates (CSS)
Since the introduction of surrogate data testing, surrogates have been used in many studies to test conclusions on the presence of chaos. Theiler [65] introduced cycle shuffled surrogates as a means to guide the judgement required in the search of chaos. Unlike many previously discussed surrogates, this is not a test for nonlinearity in the time series, so this should be established beforehand if unknown.
Null hypothesis. The signal is generated by a periodic oscillator with no dynamical correlation between cycles, i.e. the evolution of cycles is not deterministic.

Algorithm.
1. Use regular peaks or troughs in the data to separate each cycle in the time series.  Issues. The cycle-shuffled method has the disadvantage that if the peaks and troughs of each cycle do not always occur at the same amplitude, aligning them vertically may introduce nonstationarity and jumps into the surrogate data which are not present in the original data [43], therefore increasing the risk of incorrectly rejecting the null hypothesis. The algorithm also requires that the individual cycles of oscillation can clearly be distinguished, for example with clear, regular events such as an R-peak in an ECG (see Fig. 14), or the firing of a neuron in a membrane potential recording.
Applications. Theiler [65] introduced and applied this method to a nearly periodic EEG time series recorded from a patient undergoing an epileptic seizure. Although it had previously been reported to be chaotic [116], the author was not able to draw the same conclusion when using these surrogates coupled with the calculation of the Lyapunov exponents and correlation dimension [65]. Ivanov et al. [117] found a loss of multifractality in heartbeat dynamics with congestive heart failure using cycle shuffled surrogates. Also applying CSS to cardiac dynamics, Musizza et al. [23] investigated interactions between cardiac, respiratory and EEG oscillations in rats. CSS were also used to analyse X-ray light curves, where evidence of chaos could not be concluded using these methods [11].

Pseudo-periodic surrogates (PPS)
Introduced by Small et al. in [5], and further developed in [19], pseudo-periodic surrogates (PPS) test the null hypothesis that the system dynamics can be reduced to a periodic orbit with uncorrelated noise, or in other words that there is no determinism other than the periodic behaviour in the observed data. PPS combined with the correlation dimension were shown to be able to distinguish between chaotic and periodic behaviour in the Rössler system [5]. The surrogates can also distinguish between periodic systems driven by white and coloured noise [5], demonstrating that the null hypothesis includes only uncorrelated noise in the intercycle dynamics. If coloured noise is to be part of the null hypothesis, the updated algorithm discussed in [69] should be used.
Null hypothesis. Data are a realization of a periodic orbit with uncorrelated noise, i.e. no determinism is present in the system other than the periodic behaviour.
Algorithm. The basic idea of PPS is the addition of uncorrelated noise into the system in the form of small random shifts to nearby points in the phase space to investigate how this affects the dynamics. If there is a stable periodic cycle, all points will still approach it, and the observed dynamics will remain almost unchanged. However, if the dynamics is more complicated, for example in a chaotic system, then the dynamics will change, but the shape of the signal will be preserved, providing an advantage over cycle-shuffled surrogates.
1. Reconstruct the original time series ⃗ x in phase space, using time delay embedding (see Appendix A.6). 2. Find the index q of the first nearest neighbour from the first half of the embedded signal to its last value ⃗ This applies periodic boundary conditions to avoid cycling near the last value, for example when the nearest value to the last value is the previous one and the noise level is small. 3. Choose the noise level, which is the free parameter ρ. 4. As the first value of the generated surrogate, choose ⃗ x i for random i, equivalent to a random time shift. 5. Add an m-dimensional vector of random Gaussian noise with variance ρ to the current surrogate value to produce perturbed values. 6. Find the location j of the nearest point of the original phase space trajectory to the perturbed value. If j is the last index, i.e. j = L set j = q. An example PPS surrogate is shown in Fig. 15.
Parameters. PPS requires an estimation of the embedding dimension [118]. It also requires selection of the free parameter ρ. No general formula has been proposed for this noise level. When ρ → 0 the surrogates will recover the original signal, whereas if ρ is very large, then the surrogates will resemble random permutation surrogates. It was suggested by Small et al. [5] that the optimal ρ should maximize the expected number of short segments (of length 2) that are the same for the original time series and the surrogate. However, this approach is computationally expensive. Here, we propose that the choice of the noise level ρ be based on the average distance between nearest neighbours in the reconstructed phase space: where ⟨· · · ⟩ denotes averaging over i. While this choice depends on distribution, it will usually establish a large number of short segments and many jumps at the same time, since the probability of white noise with variance ρ to outreach 0.7ρ is ≈52%. Note that this applies for maximum distance choice, and thus stays valid for any embedding dimension.
Issues. PPS are sensitive to the addition of observational noise, and can even amplify it, so care must be taken in very noisy systems. Unlike the similar surrogates suggested by Small and Judd [119], PPS have a more intuitive method for the specification of the noise parameter. The computational cost of PPS is very high. The PPS algorithm in [5,19] works well even with very large amounts of dynamical noise, but may incorrectly reject the null hypothesis if the intercycles of the pseudo-periodic orbit have a linear stochastic dependence induced by coloured additive observational noise. The updated algorithms introduced in [69,120] for pseudo-periodic surrogates also allow the rejection of the coloured noise case, without the requirement for embedding. This null hypothesis covers a wider scope and its rejection would mean that the time series was even more likely to be chaotic. However, some problems with Luo's method have been highlighted by Shiro et al. [121], who introduced the idea of using Poincaré sections combined with random shuffle surrogates to search for chaos. More recently, neural networks have been used to increase the accuracy of the distinction between pseudoperiodic and chaotic time series [122]. These multiple approaches to the problem of identifying chaos in data demonstrate that it is not a trivial task, especially in the presence of noise. While introducing twin surrogates (see below) Thiel et al. [123] state that PPS are not appropriate to test for phase synchronization because they are not capable of mimicking chaotic oscillators. Keylock discusses the similarities of this approach with the work of Moeckel & Murray [124,125].
Applications. Small & Tse [19] combined PPS and AAFT surrogates in the analysis of financial data sets and found that the AAFT surrogates were clearly rejected, yet the PPS surrogates were not, suggesting that the financial data exhibits short term nonlinear deterministic dynamics, but no long term determinism. Luo et al. [69] applied a modified PPS surrogate algorithm to an ECG recording and found no evidence that it was chaotic.

Discriminating statistics for nonlinearity testing
The surrogates discussed above are usually used in combination with one or more discriminating statistics. Many statistics have been proposed for the detection of nonlinearity. Here we review the most widely used. A variety of discriminating statistics are also reviewed in [126] and software for their calculation is provided by the authors. Some of the following statistics and surrogates for nonlinearity testing require time-delay embedding (see Appendix A.6), which attempts to recover the true multi-dimensional dynamics of the underlying system from a univariate time series.

Correlation dimension
Since its introduction by Grassberger and Procaccia in 1983 [40], the correlation dimension, ν, has become very popular since, among other uses, it provides the possibility of distinguishing between deterministic behaviour such as chaos, and random behaviour. The correlation dimension aims to measure the actual dimension of fractal objects. In the original paper the correlation dimension of different objects is compared to other measures of fractal dimension and demonstrated to be closely related [40]. Calculation of the correlation dimension requires embedding (see Appendix A.6) and is based on the correlation integral where r ij = ∥⃗ x i − ⃗ x j ∥ is the maximum (recommended) or Euclidean distance between ⃗ x i,j , ⃗ x i is the embedded signal, and L is the length of the embedded signal. The correlation integral is in fact the fraction of intervector distances less than ϵ, and ranges from 0 to 1. The correlation dimension ν is then given as It is clear that in real situations and computations we are not able to take such limits. Usually the graph of log C (ϵ, L) vs. log ϵ is constructed, regular linear areas are identified and from them the correlation dimension is calculated. However, the graph needs to be plotted and analysed each time, which can be inconvenient. Thus, the most popular estimation of correlation dimension is given by the so-called Taken's best estimate [127,128]: where ϵ max is some upper cutoff on ϵ in (7), which should be neither too large nor too small. Alternatively, Taken's best estimate (9) [127,129] can be rewritten in the form which is more convenient for numerical implementation. In (10) the sum is taken over all unique pairs i, j for which ∥⃗ x i − ⃗ x j ∥ < ϵ max , and N r ij <ϵmax is the number of such pairs. One can of course try to simply find ν as the average slope of log C (ϵ, L) over log ϵ, but we found this estimation to be much more unstable than (9) and (10), especially when the embedding dimension is not known and needs to be estimated, for example by the false nearest neighbours method. Direct use of the correlation integral C (ϵ, L) has also been suggested as a discriminating statistic [3].
In the course of our investigations, we found that in general, the use of the maximum distance in (7) appears superior to the use of Euclidean distance. However, we have not found any better choice of ϵ than 1/4 of the root mean square of the signal, as originally used in [3]. This choice performs well even when the embedding dimension is highly overestimated. However, the performance of such choice is still not perfect in many cases, and the correlation dimension is quite often underestimated and sometimes overestimated. Thus, in all further investigations, we use maximum distance in (7) and ϵ = σ /4 in (9), where σ is the root mean square of the signal (before embedding or first component after embedding) There are some possible modifications to the calculation of the correlation integral (7) and, subsequently, the correlation dimension, as discussed in [42]. However, their effect is considerable only for small data lengths. If calculated straightforwardly, i.e. by calculating distances between each pair of points, the computational cost of estimating the correlation dimension is O(N 2 ), but there are alternative algorithms which make this faster. We employed the modification of the algorithm suggested in [129] with computational cost usually between O(N log N) and O(N 3/2 ). Examples of the use of the correlation dimension and different surrogate types to demonstrate nonlinearity in time series can be found in Section 9.1.1.

Nonlinear prediction error
Nonlinear prediction error characterizes the predictability of the system based on its reconstructed phase space trajectory. Deterministic systems should be much more predictable (lower prediction error) than stochastic ones. The most popular prediction error is of the form [4,130]: where s n are values of the scalar time series, ⃗ x n = {s n , s n+τ , . . . , s n+(d−1)τ } is its phase space embedding, while d and τ are embedding dimension and time-delay, respectively. Nonlinear predictor F (⃗ x n , ϵ) tries to predict the next value s n+1 (in fact first component of ⃗ x n+1 ) based on the previous point in the phase space ⃗ x n . It is given by averaging over the future values of all neighbouring delay vectors closer than ϵ in d dimensions: where s N ϵ is the number of such neighbours. There are also other choices of both the form of the prediction error (11) and the nonlinear predictor (12). They include prediction over a few time steps, and prediction based both on the past and future values etc. However, the form (11), (12) is the most popular.

Volterra polynomials
Barahona & Poon [131] first proposed the use of Volterra polynomials for the detection of nonlinearity. This approach was then successfully used to investigate the performance of different surrogate types, revealing some important issues with AAFT and IAAFT [87]. Volterra polynomials p One can fit a polynomial to the data, expressing −1) ) and choosing coefficients a k to minimize mean squared error Nonlinear systems should be better fitted by nonlinear polynomials, while for linear systems there are only the linear terms in (13). Thus, one can fit a signal and its surrogates to polynomials (13) and observe how the goodness of fit increases for the signal and its surrogates with the appearance of nonlinear terms in the fitting polynomials. Goodness of fit K (m) n can be calculated as wherex = ∑ N i=m+1 x i+1 is the mean value of the signal. Since the standard deviation of the signal the worst possible fit, goodness of fit K (m) n ranges from zero to unity. If with the addition of nonlinear terms the goodness of fit for the signal increases considerably more than for the surrogates, this indicates nonlinearity.
Ideally, one would have discriminating statistics as single numbers. Thus, the difference of the goodness of fit with linear and nonlinear polynomials f linear (x i , . . . , x i−9 ) = a 0 + a 1 x i + a 2 x i−1 + · · · + a 10 x i−9 and f nonlinear ( i can be used. Although usually it is enough to observe increase in goodness of fit with addition of first nonlinear term (e.g. x 2 i ), this is not always the case (see e.g. Fig. 7 in [87]). For nonlinear signals, has to be larger than for linear signals (e.g. surrogates), so it should be used as a one sided test. It is simple and fast, does not require embedding and yields good results (especially for discrete signals). There may be some nonlinear signals for which goodness of fit will considerably increase only with addition of more higher terms, but they are extremely rare. One should be very careful when applying this technique for continuous systems, since goodness of fit greatly increases for such systems with addition of linear terms even when the system is highly nonlinear. Care should be taken in the implementation of this technique, as the results also vary with sampling frequency. Section 9.1.2 presents an example of the use of Volterra polynomials and various surrogate types to search for nonlinearity in data.

Time-irreversibility
The measure is called time-irreversibility and, as suggested by its name, characterizes invariance of the signal with respect to inversion of time. It is a very powerful statistic, as was shown in [130], while also being computationally very cheap and does not require embedding. As all linear systems are time-reversible, time-irreversibility indicates nonlinearity. It can achieve both positive and negative values, while for time-reversible signals it should be near zero. Thus, contrary to the previous statistics, it is used as a two-sided test: the surrogate test rejects the null hypothesis if either (16) for the original signal is higher or lower than for its surrogates. It should be emphasized that time-irreversibility is a sufficient condition for nonlinearity, but not necessary. Thus, many nonlinear systems are time-invariant, and in this quite large extent of cases time-irreversibility (16) used as discriminative statistic has zero power. For example, with this statistic one is not able to distinguish nonlinearity in the Lorenz system [130]. Thus, any conclusions should be made only if the null hypothesis of time-reversibility is rejected and nonlinearity is indicated, while otherwise it just indicates that the system is possibly time reversible.
Time-irreversibility, also known as temporal asymmetry, was used in [22] to demonstrate the presence of temporal asymmetry in EEG data.

Monte Carlo singular spectrum analysis (MCSSA)
Allen & Smith [8] introduced a statistical test which combines singular spectrum analysis (SSA) with surrogate data testing in order to test for oscillations against an arbitrary 'coloured noise' hypothesis, but largely focused on noise which can be described by an autoregressive (AR) model. This method, known as Monte Carlo SSA (MCSSA), tests for the presence of modulated oscillations against an arbitrary null hypothesis [8]. To date, the technique has largely been applied to first order autoregressive (AR(1)) noise [8,132], in particular to climate data with red noise backgrounds, but the generalization to higher order AR and moving average (MA) processes is straightforward [8]. Paluš and Novotná [133] extended the method to test the dynamical properties of significant modes and demonstrated significant oscillations in historical European temperature records [132]. Vejmelka et al. [134,135] further extended this approach to higher orders using the chaotic nonlinear Lorenz system with additive 5th order autoregressive noise, and found evidence of nonlinear oscillations only in the specific frequency band where the Lorenz oscillations manifested. The same authors also applied this method to EEG data and found nonlinearity during sleep stage 2, in which spindle-like oscillatory phenomena in the sigma band (11)(12)(13)(14)(15)(16)(17) are observed [134,135].
Here we consider the first order autoregressive (AR(1)) model (assuming c = 0), MCSSA is implemented as follows: 1. Decide on the null hypothesis, i.e. which noise model is to be tested against (as discussed in Section 4.4). 2. Embed the time series [127,136] using a window of length L, to obtain a set of vectors which are delayed versions of the original time series (x 1 , . . . , x N ) of length N, . . . . . .
3. Calculate the time lag-covariance of the time series, 4. Diagonalize C: where Λ is the diagonal matrix and the columns of E correspond to consecutive eigenvectors (e 1 , e 2 , . . . , e L ).
5. Sort eigenvalues Λ into decreasing order and thus obtain the eigenspectrum. 6. Apply the same procedure to surrogate data, using the eigenvectors from the original data, and compare the eigenspectra for the time series and the distribution of surrogates. Oscillations will be represented as an almost equal pair of eigenvalues above the surrogate threshold, which will also have almost equal associated frequencies, as determined from the power spectrum of the eigenvectors.
The method used here to decompose a single time series into a matrix and obtain its eigenvalues and eigenvectors is known as Karhunen-Loève expansion or principle value decomposition [29,137,138], discussed in [139], and has some parameters which need to be chosen: the time delay and the embedding dimension. Time delay embedding, and the choice of these parameters, is discussed in Appendix A.6.
If evidence of oscillations is found in the time series based on the discovery of significant eigenvectors, they can be reconstructed using diagonal averaging of these eigenvectors.
The length of the signal will affect the results obtained using MCSSA. The longer the signal, the less fluctuations in the autocorrelation, and the smaller the width of the distribution of eigenvalues, ultimately leading to more precise results. It should be noted that this method assumes that the oscillations are stationary, so too much time-variation may lead to an artificially large number of modes being required to reconstruct the time series.
In Section 9.1.3, we test the performance of MCSSA on data containing simulated oscillations with additive autoregressive noise and real experimental EEG data.

Lyapunov exponents
Lyapunov exponents quantify the rate of separation of infinitesimally close trajectories. The largest Lyapunov exponent is a very useful quantity to calculate, since if it is positive it indicates not simply nonlinearity, but chaos [140][141][142]. However, there is also the issue that the other Lyapunov exponents are only meaningful if there is a positive exponent, and if chaos is not present in the system, there is a greater amount of statistical error in the calculation as the difference between the trajectories in phase space is relatively small. Apart from difficulties in its estimation when the equations of motion are unknown, which is usually the case in reality, its accurate calculation may also be difficult in the presence of noise [3].

Mutual information
Multidimensional mutual information (or redundancies) of time series and their delayed versions has been used to identify nonlinearity in data, in combination with surrogates [44,143]. The authors showed that the functional of the series probability distribution densities can reflect nonlinearities in the data, whilst the series covariance matrix is sensitive to linear relationships only [143]. This approach has also been extended to multivariate time series [144]. Mutual information can also be used to test for coupling, see Section 8.5.

Identifying nonlinear oscillators by harmonic extraction
Many oscillations arising from real systems are not strictly periodic, and may contain many linear and nonlinear components. Nonlinearities often manifest as higher harmonics within a signal, and investigation into the properties of these harmonics can provide valuable information about the underlying dynamics of the system. However, the identification and separation of these components can be difficult. Combining mutual information, surrogate testing, and the wavelet transform, Sheppard et al. extracted multiple harmonics from human skin blood flow data and reconstructed the nonlinear oscillations [145]. More recently, a method known as nonlinear mode decomposition has been developed, which also utilizes surrogate testing and time-frequency analysis to extract and reconstruct time-varying nonlinear modes from time series [21].

Other statistics
Using the discussed methods to estimate background noise, the significance of regions of interest in the wavelet transform can also be found using Monte Carlo methods. This can be quite computationally expensive, with the wavelet transform of all surrogate realizations required to be in memory at once in order to calculate significance levels. An alternative would be to calculate the mean and second moments for each wavelet coefficient, which can be quickly updated and discarded once a new wavelet transform is calculated, and apply a Gaussian approximation, which would be less accurate, but more memory efficient. Grinsted et al. [12] applied the cross wavelet transform and wavelet coherence to the analysis of geophysical time series. Monte Carlo methods were used to assess the statistical significance against red noise backgrounds using surrogate data set pairs with the same AR(1) coefficients as the original data, and their wavelet coherence calculated. A similar approach is also used in [146] to confirm associations in ecological time series.
Many other discriminating statistics are available for the investigation of nonlinearity. However, they are not as popular as those discussed above. Some other statistics, for example higher order autocovariance, were also shown to be less useful than those discussed [4]. For more examples of nonlinear discriminating statistics, we refer the reader to [4,130].

Surrogates for independence testing
In addition to being extremely useful in the search for underlying dynamics in univariate time series, surrogates can also be crucial in the investigation of interactions between systems, when given two or more time series. Although equally applicable to networks, here we mainly consider the bivariate case, i.e. how to infer from two given data sets whether their underlying systems are interacting, the nature of these interactions, if present, and whether their coupling is sufficient to lead to synchronization.
The FT based surrogates discussed so far for nonlinearity testing have also been used in the search for dependence between systems, by ensuring that the same phase randomization is applied in all surrogate realizations. Stam et al. [147] proposed a novel measure for generalized synchronization in multivariate data sets to overcome the bias depending on degrees of freedom. Pereda et al. [148] review multivariate analysis specifically for neurophysiological time series, and include an in-depth discussion of surrogates in this context. They illustrate the use of multivariate surrogate data testing for the assessment of the strength and the type of interdependence between neurophysiological signals. In an application to ecology, Sheppard et al. [18] investigated spatial synchrony in aphid pests. Spatial coherence was tested for significance using Fourier transform surrogates with their cross-spectrum and cross-correlation conserved [18].
There are many types of surrogates other than the FT based ones already discussed, and the most useful for the purpose of testing for independence between signals are discussed in this section. Some of these surrogates require some extra steps before they can be used. Cyclic phase permutation (CPP) surrogates require extraction of the phase of the oscillations in a time series, whilst twin surrogates (TS) require time-delay embedding as previously discussed for pseudo-periodic surrogates (see Section 5.4). The onset of phase synchronization between two interacting oscillators with increasing coupling strength has been shown to precede the appearance of amplitude covariations [149], and thus can be detected at weaker coupling strengths and in noisy systems by only considering phase dynamics. Therefore, here we discuss the surrogate methods which have been used to investigate phase synchronization only.
Surrogates can be used to test for phase synchronization and coupling between oscillators using various discriminating statistics, including phase coherence, the phase synchronization index (PSI) [150], mutual information [151], dynamical Bayesian inference [152] and synchronization likelihood [147] among many others (some of these measures are discussed in Section 8). However, great care is required in their application. Many methods used for this purpose rely on various representations and quantifications of the phase difference between two oscillators. It is very important to remember that a constant phase difference alone, whilst a requirement for phase synchronization, is not sufficient [153].
We previously discussed Fourier transform based surrogates for testing for nonlinearity in the time series, but they have also been used successfully in bivariate data to search for phase synchronization. Paluš & Hoyer [6] demonstrated the use of FT surrogates for the investigation of cardiorespiratory synchronization in newborn piglets. They showed that the null hypotheses of random permutation surrogates, with and without their cross-dependence maintained, are not sufficient to provide an answer to whether the system is synchronized. Instead they showed, using mutual information as a discriminating statistic, that FT surrogates can be used to test for significant phase synchronization, and FT surrogates with the cross-spectrum of the original data maintained, as detailed in [60], can be used to test whether it is nonlinear [6]. Using these methods, the authors showed significant nonlinearity and phase synchronization between the respiratory movements and the heart rate fluctuations of the piglets.
Schäfer et al. [31] highlight some problems with the approach of using surrogates for the detection of synchronization, arguing that Fourier phase randomization alone may be inappropriate, because of the mixing of phase and amplitude information. Hurtado et al. [78] developed a variant on the FT surrogate which works only with phase and preserves the power spectrum of the instantaneous frequency rather than that of the original data.
Sun et al. [154] also compared surrogate tests for phase synchronization in order to ascertain which surrogate method is most appropriate for phase synchronization analysis. Two widely used phase synchronization indices, mean phase coherence and entropy based, were compared and combined with five surrogate types. In contrast to [78], the authors conclude that RP and FT surrogates of the instantaneous frequency are not suitable for phase synchronization analysis in EEG signals, as the thresholds derived from them correlate highly with the original phase synchronization index (PSI) values. It was also shown that PS index based on mean phase coherence was less affected by the sampling rate of the data when compared to the entropy based PSI, suggesting that the former is more widely applicable in real data.
This study concluded that FT surrogates of the original time series provide the closest result to the true result provided by intersubject surrogates (which were used as the standard against which the performance of all other tests were measured), and are thus recommended in the absence of the latter [154]. However, this study only included RP and FT surrogates, so it is possible that other surrogates could offer improvements on these results. Intersubject surrogates are discussed further in Section 7.1.
Mezeiova & Paluš [155] concluded that phase synchronization analysis is not superior to coherence analysis, stating that both techniques lead to similar results. However, this was using mean phase coherence as a measure of synchronization, which may demonstrate the need for measures which focus more on the interactions between the systems over time rather than their averaged coherence, such as dynamical Bayesian inference.
It is clear from these mixed results that Fourier based surrogate tests for synchronization have their disadvantages. To counter this, other surrogate types have since been presented in attempts to address these problems [78,123,156] and are discussed in the following sections.

Intersubject surrogates
Intersubject surrogates [27], also known as mismatched [157], or natural [123] surrogates, are truly independent realizations of a system being studied. This technique was introduced by Toledo et al. [158] for the investigation of cardiorespiratory synchronization, and has since been used to investigate maternal and foetal heart rates [159], ageing [27,160], and brainheart-lung interactions under general anaesthesia [161].
As an example, consider the scenario in which the interactions between the breathing and the heart rate are being studied in a large group of subjects. There will be data available for each subject, measured on different days and at different times, and it is clear that if it is mismatched, i.e. the breathing of subject one is compared to the heart rate of subject two, we should not expect any dependence at all as there can be no causal relations. This method has been used as a standard against which other surrogates are tested in the search for synchronization [154].
Null hypothesis. There is no dependence between the two systems.
Algorithm. Create surrogate pairs by finding all possible mismatched pairs available in the data set.
Issues. There are some drawbacks in the use of these surrogates. It is possible that the dynamics of the underlying systems may be different, for example the natural frequency may be slightly different [123], leading to surrogate realizations which are not fully constrained. Subject-specific features of the signals (like particular frequencies or waveforms) could lead to higher variability in the results. In some systems, it is also possible that enough data collection to provide sufficient natural mismatched pairs may be too expensive both in time and cost.
Applications. Sheppard et al. [157] applied wavelet-based time-localized phase coherence and intersubject surrogates to investigate the relationship between blood flow and skin temperature, and blood flow and instantaneous heart rate (IHR) during vasoconstriction and vasodilation provoked by local cooling or heating of the skin. The distribution of actual phase coherence values at each frequency was compared with the surrogate distribution using the Kolmogorov-Smirnov test.
Intersubject surrogates were also used by Iatsenko et al. [27] with the synchrosqueezed wavelet transform and Bayesian inference to investigate the cardiorespiratory coupling in terms of synchronization, coupling functions and contributions of different mechanisms. The authors found that intersubject and IAAFT surrogates gave the same significance threshold for wavelet phase coherence.
Donner & Thiel [162] used historic solar activity data with similar dynamics as data from the past 130 years as surrogates for their tests on sun-spot activity between hemispheres.

Cyclic phase permutation (CPP) surrogates
Cyclic phase permutation surrogates are phase surrogates. They are similar to cycle shuffled surrogates (see Section 5.3), but do not possess their drawbacks as they work directly with the phase of the signal. These surrogates are designed to test for interdependence between systems and are constructed by rearranging the cycles within the extracted phase. If the phase dynamics of two systems are not independent, then phase evolution over time of one system will be dependent on the phase evolution of the other system. Rearranging the cycles destroys this dependence, whilst preserving the general form of the phase dynamics of each system. This simple approach is very powerful for testing significance levels of measures which are expressed through the phases of systems.
Null hypothesis. Two systems have independent phase dynamics.

Algorithm.
1. Extract the instantaneous phase from the time series using either the Hilbert transform or extraction from the wavelet transform (see Appendix A.3 for more details). 2. If the extracted phase is unwrapped (not bounded), wrap the phases between 0 and 2π . 3. Find individual cycles in the signal and divide it into cycles by selecting points of discontinuity, i.e. places in the wrapped phase signal in which the phase suddenly drops a large amount. 4. At the beginning and end of the time series there will be incomplete cycles, these should remain in place for all surrogates. 5. Reconstruct a new phase series with these beginning and end points, and randomly permute all other phase cycles between them. 6. Repeat for each surrogate.
The calculation of CPP surrogates is very fast. An example CPP surrogate is shown in Fig. 16.
Issues. CPP are capable of testing whether the phase dynamics of periodic signals is really synchronized or whether they just have the same rhythm, i.e. can distinguish between coherence and synchronization. This is because they seek more sophisticated expressions of synchronization than just coincidence of rhythms, such as the adjustability of cycle periods between two systems. However, it might be that, for example, both phase dynamics and frequency modulation are independent, but coincide, in which case this test will reject the null hypothesis. In real systems this is very unlikely. The null hypothesis will also be rejected if the noise in one system is correlated with the noise in the other, which can still be regarded as dependence, since in real systems noise can be just a complicated realization of high dimensional phase dynamics, and thus such systems are indirectly coupled through other elements.

Applications.
Jamšek et al. [7] combined conditional mutual information and cyclic phase permutation (CPP) surrogates in the investigation of the nature, strength, and direction of coupling between two or more time-varying oscillators. CPP surrogates are used in [163] with different methods of conditional mutual information estimation to investigate the problem of inferring direction of coupling from time series, but it is shown that any current surrogate method may introduce bias into the results. CPP surrogates are also used in [155] in the comparison of coherence and phase synchronization analysis.

Twin surrogates (TS)
Unlike PPS, which by definition cannot mimic chaotic oscillators, twin surrogates were developed to test for complex synchronization by Thiel et al. [123] and further developed in [156,164]. They are based on the popular recurrence plot analysis discussed in [165]. Twin surrogates mimic the dynamical behaviour of the system, reproducing both linear and nonlinear properties. They correspond to a trajectory of the underlying system, but starting at different random initial conditions.

Null hypothesis. Two systems are independent.
Algorithm. Twin surrogates, like PPS, require embedding before calculation.
1. Embed original time series and produce a d-dimensional representation ⃗ x. 2. Find the index q of the first nearest neighbour from the first half of the embedded signal to its last value ⃗ x L : ∥⃗ x q −⃗ x L ∥ = min, q ∈ [1, L/2]. This is so that periodic boundary conditions can be applied to avoid cycling near the last value, for example when the last value has no twins and the nearest value to it is the previous one.

Choose parameter δ. (See below)
4. Calculate recurrence matrix R ij : where H(x > 0) = 1, H(x < 0) = 0 is a Heaviside function. We recommend the use of maximum norm ∥x i − x j ∥ = max(x i − x j ). 5. For each i, find the set of j's corresponding to similar columns in the recurrence matrix: R ik = R jk for any k. Points ⃗ x j then have the same neighbourhood as ⃗ x i and are called twins. Denote these sets S[i] and include in this set trivial i so that it will always contain at least one element. It should be noted that almost always and for any δ there are similar columns in the recurrence matrix.
6. As a first value of the surrogate signal, set ⃗ S 1 = ⃗ x i for random i = 1 . . . L. 7. Given the value of the current iteration ⃗ S i = ⃗ x k , randomly select index j from the set of twins ⃗ x k : j = rand (S[k]). If j is the last index j = L, set j = q. Set next iteration value as next value for chosen twin ⃗ S i+1 = ⃗ x j+1 . 8. Repeat steps 5-6 to generate ⃗ S of length L and choose its first component as the scalar surrogate signal. To generate more surrogates, repeat steps 6-8. Fig. 17.

An example TS surrogate is shown in
Parameters. The only parameter, aside from those for the initial embedding, is δ, and it was shown that the results are not hugely influenced by its value in reasonable limits [156]. However, each interval δ should include at least one point [166]. It is stated in [123] that the appropriate choice of δ should correspond to 5-20% of unit entries in the recurrence matrix R ij .
To find δ which will give proportion α unities in R ij (so there are overall αN 2 unities), one should first calculate distances between points in phase space and then set δ to the αN 2 th smallest distance. So the procedure is as follows: (1) Choose the desired proportion α of unities in the recurrence matrix. According to [123], an appropriate choice is α ∈ [0.05; 0.2]. In our algorithms we use α = 0.1 (as the authors in [123] also usually do). (2) Calculate matrix of distances (3) Sort D ij in ascending order and set δ to its αN 2 th smallest value: δ = sort(D ij ) αN 2 . This choice will give proportion of α unit entries in R ij . This is done in step 3 of the algorithm. To not slow down computations, the calculated D ij (21) is then substituted into (20) in step 4 to calculate the recurrence matrix R ij = H(δ −D ij ).
Other choices also exist ( [165] pp. 11-12), but in general the one discussed here is sufficient. It was also shown that the performance of twin surrogates does not depend strongly on the choice of embedding parameters [156].
Issues. The computational cost for TS is extremely high. The most computationally expensive procedure is the seeking of twins, since for each i = 1, . . . , L it needs to compare L − 1 columns in the recurrence matrix with L elements each. Next, the recurrence matrix of L 2 elements needs to be saved in the memory, which may slow down computation or be impossible for very large data sets. Once twins have been found, which only needs to be done once, surrogates can be calculated very fast, making the computational speed almost independent of the number of surrogates, which can be seen in Fig. 34(B).
Applications. Thiel et al. [164] applied twin surrogates to study eye movements, and found that the fixational movements of the left and right eye are synchronized. Twin surrogates are also used in a climate context in [167], where they were used to help uncover novel pathways of global energy and dynamical information flow in the climate system, and were shown to have a higher test power than shuffled time series or Fourier surrogates for this purpose. Romano et al. [168] proposed indices for the detection of asymmetry of coupling between interacting systems, and used twin surrogates to generate significance levels. TS were used in [169] to investigate synchronization between maternal and foetal heart rates with paced maternal breathing.

Time-shifted surrogates
It was first proposed in [170] to generate surrogates by time-shifting the original data. This approach was then further developed in [171]. Time-shifted surrogates will have the same state space trajectory, and thus fully preserve all of the properties of the original signal. They test the null hypothesis of two processes with arbitrary linear or nonlinear structure but without nonlinear interdependence and without significant linear cross-correlation. Two variants have been proposed. The first is to shift the time series using periodic boundary conditions, i.e. wrapping around the end of a shifted time series to the beginning of the new time series. The second approach is to cut a window of samples from one signal and using as time shifted windows from the second signal as surrogates. The latter was shown to be safer due to the fact that using periodic boundary conditions can introduce a phase slip [171]. However, this problem is solved by preprocessing the data as discussed in Section 4.3. Thus, we propose the use of the former with preprocessing, to avoid unnecessary data loss, which is less than ideal when studying dynamical systems for which limited data is available.
It should be emphasized that in [170,171] time-shifted surrogates are not used as surrogates in the usual sense, but by plotting the dependence of some nonlinear measure on the relative time shift between two time series, i.e. between one signal and a time-shifted surrogate of the other. If there is a clear peak at zero shift (which represents the original signal) in this dependence, it indicates rejection of the null hypothesis, since time-shifting decreases the interdependence. Observing a peak at a different time-shift may also indicate rejection of the null hypothesis, but it is unclear if this is always the case, and may depend on the discrimination statistic used. It was shown in [171] that this technique is useful for indicating nonlinear interdependence. These surrogates have also been used in a direct way using random time delays, and are discussed in the applications section below. Null hypothesis. Two processes do not have any nonlinear interdependence or significant linear cross-correlation. The self structure of each signal can be arbitrary.

Preprocess the signal according to Section 4.3. Preprocessing is crucial here, as a mismatch at the beginning and end
of the signal may lead to discontinuities. 2. Choose a random time shift independently for each surrogate. 3. Shift the signal in time, wrapping its end to the beginning.
Time-shifted surrogates are very fast to generate. Only one signal needs to be shifted, it does not matter which one. An example time-shifted surrogate is shown in Fig. 18.
Issues. Time-shifted surrogates can be viewed as an independent copy of the system which preserve not only intracycle behaviour, but also intercycle behaviour (except for the first and last cycles). This makes them more constrained than TS, PPS and CPP surrogates. Time-shifting changes the cross-moments between two systems, but they may not be randomized completely, i.e. the full set of possible realizations consistent with the null hypothesis may not be realized. A larger number of time-shifted surrogates will generally be required than IAAFT, for example. The spectrum and the distribution are preserved exactly.
Applications. Canolty et al. [172] used time-shifting to demonstrate that an observed coupling strength between theta and high gamma oscillations in the human brain falls to much lower levels at large time lags, proving that this coupling was a significant observation. Petrock et al. [173] calculated cross wavelet transforms between cardiac and respiratory rhythms to identify any correlations between the signals. The significance level was found using time shifted surrogates, which in this case were also reversed in time following the time shift. The newly introduced method of nonlinear mode decomposition (NMD), a decomposition method for extracting oscillatory modes from time series [21], uses time-shifted surrogates to determine whether the harmonics of an oscillation in a time-frequency representation of the signal are significant based on the null hypothesis of independence between the first harmonic and the extracted harmonic candidate. Netoff & Schiff [174] investigated synchrony in neural networks using time-shifted surrogates and mutual information.
As discussed previously, time-shifting as a method for generating surrogates has also been used as part of wavelet surrogate generation algorithms, in which the wavelet coefficients obtained from the discrete wavelet transform are timeshifted at each scale [108]. Time-shifted surrogates have also proven useful in the assessment of the performance of coupling measures [175,176].

Discriminating statistics for interdependence testing
Although surrogates were originally introduced to test for nonlinearity, they were soon also employed to search for interdependence between dynamical systems. As for nonlinearity, there are many discriminating statistics that can be used for this purpose, in bivariate and multivariate data, but here we focus mainly on synchronization and coupling between two systems.
The dynamical systems found in nature are rarely isolated. Instead, they constantly interact with each other. The investigation of coupling has developed into a very active and rapidly evolving field [177]. However, the question of the nature of coupling between interacting systems still remains highly non-trivial to answer.

Phase coherence
Phase coherence provides a measure of how stable the phase difference between oscillators is over time, and is the main measure of synchronization used in this paper. Phase coherence is defined as [29,157], where φ 1 (ω, t n ) − φ 2 (ω, t n ) is the phase difference between the oscillatory components of the same frequency from the two time series at time t j . In an ideal scenario, a value of 1 would be returned for perfect synchronization, and a value of zero for complete independence. It is highly unlikely in real systems to obtain these values. Synchronization measures, coupled with existing phase extraction measures, may not always indicate true synchronization in systems, due to the effects of finite data length, similar power spectra, or other factors. Therefore, in order to ascertain whether an obtained synchronization measure value indicates significance, we need to compare it against the same measure calculated in a case in which we know that there is no synchronization. This is where surrogates become essential.
It should be noted that the notions of phase coherence and phase synchronization are not interchangeable. Consider two sinusoids with perfectly matching frequencies. Calculation of the phase coherence of these two signals will yield a value of one, i.e. they will be perfectly phase coherent. But how do we know if they are synchronized? Unfortunately, if there is no time-variability in the phase dynamics of the two signals, we cannot know if they are synchronized, or whether they are coherent by chance, because we cannot observe how each oscillator may adjust the dynamics of the other over time. It is important to note that, assuming correct preprocessing as discussed in Section 4.3, FT-based surrogates of a perfect sine wave (which is an edge case of a linear process) can only be random in the relative phase shift, so all surrogate realizations will also be perfect sine waves, resulting in the acceptance of the corresponding null hypothesis. When testing for synchronization, it is crucial to remember that synchronization is not a state, but a process of adjustment of rhythms due to interaction [31] and thus we must also consider the evolution of the system and its parameters over time.
Phase synchronization is usually defined as the locking of phases φ 1,2 : for integers n and m. Rosenblum et al. [178] described phase synchronization in coupled chaotic systems, for which the phase locking is relaxed as: In both cases, the amplitudes can be completely independent.

Synchrogram
A synchrogram is a visual representation of synchronization [29] between two oscillators in which marked events of one oscillator are compared to the wrapped (bounded between 0 and 2π ) continuous phase of the other. Recording the phase of oscillator 2 each time an event, such as a peak, occurs in oscillator 1 and plotting this phase over time provides information about the phase difference over time. If the oscillators are synchronized, the marked events will occur at the same point within the cycle of the second oscillator over many cycles, and horizontal lines will emerge. The number of horizontal lines determines the type of synchronization, for example for 1:1 synchronization there would be one horizontal line, for 2:1 two horizontal lines. If the oscillators are not synchronized, a random scattering of phases will be observed. If there is a preferred phase difference in the system, horizontal lines will become visible as shown in Fig. 19(B). If anything more than 1:1 synchronization is present in the data, the cycles over which the phase is wrapped needs to be increased, but this has to be done manually, making this method cumbersome for complex synchronization. In addition, it is very difficult to quantify synchronization using only the synchrogram, so other measures have been introduced.

Phase synchronization index
Introduced by Tass et al. [150], the phase synchronization index (PSI) involves dividing the cycle of one oscillator into N bins and integrating the complex phase of the other oscillator over each bin [29], in order to assess whether the section of the phase of each oscillator coincides within each division of the cycle. The average over bins then gives the PSI. The PSI suffers from the same problem as the synchrogram, that anything other than 1:1 synchronization has to be found by trying different combinations of n and m to eventually find n : m synchronization. For perfect synchronization, the PSI would be 1, but in practice this is very rare, and surrogates are required to investigate whether the observed PSI is significant. It should be noted that PSI values strongly depend on length of signals, especially if the synchronization is intermittent or variable. Sun et al. [154] suggest 3-16 or 5-12 wave cycles is optimal for PSI, at least in EEG. Fig. 19. It is well known that the instantaneous heart rate (IHR) and the breathing (respiration) are coupled. This behaviour is called respiratory sinus arrhythmia (RSA), and can be seen in panel (A). An example synchrogram presents the phase of one oscillator relative to the other over time, and if perfect phase synchronization is present, horizontal lines will be observed. A clustering towards a preferred value can be seen here, but it is difficult to say whether this signifies synchronization from the synchrogram alone.

Dynamical inference
A number of methods that have been proposed for the study of interactions perform dynamical inference, i.e. they reconstruct a model of differential equations from the interacting data. These methods inevitably require surrogate testing. The reconstruction within these methods is based on a variety of inference techniques, e.g. least squares and kernel smoothing fits [153,179], dynamical Bayesian inference [180], maximum likelihood (multipleshooting) methods [181] and phase resetting [182,183].
An example of such methods is Dynamical Bayesian inference (DBI), which can detect time-varying synchronization, directionality of coupling and time-evolving coupling functions simultaneously from the phase evolution of multiple oscillators [180,184]. Following the extraction of the phases of two time series, or two oscillations within the same time series, their dynamics is assumed to be described by [55,180,184], where ω i is the natural frequency of the oscillation of each oscillator, f i (ϕ i ) are the self-dynamics of the phase, g i (ϕ i , ϕ j ) are the cross couplings, and ξ (t) is a two-dimensional white Gaussian noise ⟨ξ i (t)ξ j (τ )⟩ = δ(t − τ )E ij , included to represent a process in a real system.
The inference of parameters utilizes Bayes' theorem [185], where p X (M|X ) is the posterior probability distribution and ℓ(X |M) is the likelihood function for the values of the model parameters M given the data X , and p prior (M) is a prior distribution. The inferred parameters of the coupling functions [177] can be used to determine whether the system is synchronized. The coupling strength obtained from this method should be verified by using surrogates in which there is definitely no coupling to provide a statistical threshold for significance. See [186] for further details and an in depth tutorial on dynamical Bayesian inference and its implementation.

Mutual information
Mutual information, also known as the redundancy, is a measure of how much information is common to two time series x i and x j , and can be used to detect coupling [29]. Using conditional mutual information (CMI), which characterizes the net dependence between two discrete random variables, Paluš [45] studied unidirectionally coupled Rössler and Lorenz systems, and also unidirectionally coupled Hénon maps. It should be noted that the direction of coupling can only be inferred from experimental data when the studied systems are coupled but not yet synchronized.
All asymmetric measures, including CMI, will suffer from some bias when estimated from noisy, nonstationary experimental data, which may result in non-zero values even in the uncoupled direction of a unidirectionally coupled system. Thus, surrogates can be used to find the level of this bias in the case of no true coupling, or the ''statistical zero''. For this purpose FT surrogates are used, as CMI is independent of the distribution of the data [45]. Surrogates prevent the false detection of causality. However, a relatively large number of surrogates are required, and this number will increase if the frequencies of the two systems are different and if the system is noisy. If large amounts of noise are present, the applicability of the test is limited.
Surrogates for inferring the direction of coupling were also discussed in [163,187], and more recently by Jelfs and Chan [188] who demonstrated the importance of using surrogates when calculating directionality indices, by applying them to both CMI and evolution map approach methods.

Wavelet bispectrum analysis
Bispectral analysis belongs to a group of techniques based on high-order statistics [189,190]. It can be applied to detect Gaussianity [191], but more importantly may be used to study couplings between interacting oscillatory processes or investigate phase relations [192]. Unlike synchronization methods, which are unable to provide any information about the characteristics of couplings, bispectrum analysis is especially powerful for the investigation of the nature of interactions between coupled oscillators, while making no assumptions about the data under investigation.
Early work on bispectrum analysis followed a traditional stationary approach, and took time into account only by partitioning signals in regions of interest. Van Millingen et al. [205,211] introduced the wavelet bispectrum and the normalized version, wavelet bicoherence, and applied them to nonlinear oscillators, demonstrating the applicability of the wavelet bispectrum in turbulence analysis. Almost simultaneously, Rao and Indukmar [212] also introduced wavelet bispectral analysis, additionally presenting a discrete version for the investigation of discrete processes. Schack et al. [200] introduced a Gabor expansion for estimating the bispectrum, bicoherence and phase bicoherence, and tracked the maximal bicoherence over time in EEG signals. Jamšek et al. [213] used time-phase bispectral analysis to detect different types of coupling (including linear) of nonlinear oscillators. In later work, Jamšek et al. compared short-time-Fourier bispectral analysis with wavelet bispectral analysis [214] in the detection of couplings between oscillators with time-varying parameters. They concluded that wavelet bispectral analysis, using the Morlet wavelet, provides the best compromise for time-frequency resolution.
Jamšek et al. [7] also combined wavelet bispectral analysis with an information theoretic approach to study interactions among oscillators with time-varying basic frequencies.
The bispectrum can be derived from the Fourier spectrum as the third-order momentM(f 1 , f 2 ) [189], namelŷ (27) where F corresponds to the Fourier transform of the time series, f 1 , f 2 is any pair of frequencies of interest and f 3 = f 1 + f 2 . (However, there are some examples in the literature of modifying f 3 according to the research hypothesis, e.g. f 3 = 2f 1 − f 2 [213]). In order not to overlook time-dependent effects, the signal can be partitioned into K consecutive time windows. The average bispectrum over these windows is defined aŝ Alternatively, time-frequency representations can be applied. Typically the short-time Fourier or wavelet transform can be used but also, less commonly, Wigner-Ville transforms [215,216]. Here, we focus on wavelet-based analysis, since it provides the best compromise for time-frequency resolution by using an adaptive time window. We use the Morlet wavelet: where ω 0 /(2π ) = f 0 is the central frequency. Then the wavelet transform of the signal x(t) is given by [217] where a scale value s corresponds to a frequency value f = ω 0 /(2π s); here, there is a normalization factor ( 1 s ) p which depends on some parameter value p, conventionally non-negative. Typically, p = 1/2 is chosen, so that the square of the wavelet transform provides a suitable definition of wavelet power [217,218]. However, choosing p = 1 has the significant advantage that equal-amplitude oscillations at different frequencies are represented by peaks of equal height in the wavelet transform, like they are with the Fourier or short-time Fourier transform. The peaks will also have equal spread when viewed with a logarithmic-scale frequency axis. Therefore, in our examples, we take p = 1.
The wavelet bispectrum is then calculated as follows where T is the time length of the signal, W * the complex conjugate of the wavelet transform, s i are wavelet scales and 1 s 3 The cross bispectrum allows for the identification of coupling between different time-series x(t) and y(t) Jamšek et al. [214] introduced an alternative empirical correction into the wavelet function, for the sake of applying bispectral analysis to cardiovascular signals that contain oscillations spanning scales from 1 s to over 100 s. However, the most appropriate normalization for the wavelet bispectrum has not yet been studied or discussed from a mathematical point of view.
Different pairs of coupled oscillations with similar types and strengths of coupling, and occupying similar regions in frequency-frequency space, could have very different amplitudes of oscillations leading to very different values of the bispectrum. Bicoherence can be used to counter the effect of different amplitudes when trying to detect couplings. However, in the case of noisy real-life signals, this normalization can be a major culprit in causing bispectrum-based investigation of the presence of couplings to have poor specificity. One possible way to overcome this problem is to apply surrogate-based statistical testing [219] when applying bispectral analysis, instead of using amplitude normalization.
In real-world noisy signals, ascertaining the presence or absence of coupling is a great challenge. Early work on bicoherence surrogates was presented by Elgar et al. [220,221], who, for Gaussian processes, found numerically the critical surrogate thresholds for testing a null hypothesis of zero bicoherence. Later, Wang et al. [222] proposed surrogate testing for the null hypothesis that the original data were generated by a linear Gaussian process. For bispectra defined in terms of disjoint-window Fourier transforms, Kim et al. [223] proposed the procedure of randomizing the biphase of each timesegment while keeping the same biamplitude to test the significance of bicoherence values. This approach was later extended to wavelet bicoherence by Li et al. [224], who developed a general harmonic wavelet bicoherence, which uses a phase randomization method to estimate the cross-frequency interactions. They applied it to sleep analysis. More recently, this approach was used by Scully et al. [225] to investigate quadratic coupling in kidney blood flow data. Siu et al. [219] applied the iteratively refined surrogate data technique [199] directly to the bispectrum to bypass the spurious peaks effect that are the consequence of the bicoherence normalization. They used it to investigate the couplings in renal haemodynamic data. Ge [206] analytically derived the critical threshold for wavelet bicoherence by determining the covariance structure of relevant wavelet-based moments.
In Section 9.3.3 we compare the performance of different surrogates types for the investigation of significant peaks in the bispectrum.

Finding nonlinearity in data using the correlation dimension
Here we present an example of one of the earliest and most important applications in surrogate data testing, confirming the presence of nonlinearity in the data. Before the introduction of surrogate data testing, the search for chaos led to many premature conclusions that it was the reason for the dynamics in many observed systems. Subsequent reviews of many of these studies failed to find even nonlinearity in the data, usually immediately ruling out the possibility of chaos. These studies serve as a cautionary tale, highlighting the importance of testing all assumptions before drawing conclusions.
One of the most popular measures used to characterize chaos is the correlation dimension, ν (see Section 6.1). As an example of a chaotic (and thus nonlinear) system we consider the Rössler equations, with a = 0.165, b = 0.2 and c = 10. The system was integrated for 1000 s using a 6th order Runge-Kutta scheme with time step of 0.1 s. The x component was then used in the following tests for nonlinearity, with the second half (the last 500 s) being used in the analysis, to remove transients at the beginning of the simulation. Observational noise with NSR = 0.1 was added to the system. An example time series is shown in Fig. 20(A).
If we wish to test whether the system from which this time series was measured is chaotic, we can calculate the correlation dimension, ν. In this case, the estimated embedding dimension using the false nearest neighbours method is 3, and ν = 2.08. The fact that this number is not an integer, and is lower than the embedding dimension, suggests that the underlying system is neither noise nor a superposition of periodic orbits [43]. This could be evidence for chaos, but we must first test whether the system is nonlinear before we can draw any conclusions from this analysis. The surrogates discussed here for the detection of nonlinearity in data are FT, AAFT, IAAFT, WIAAFT and IDFS. Here, their performance is tested in this clear case of nonlinearity, and we should expect them all to reject the null hypothesis of a stationary linear Gaussian system, with or without measurement transformation depending on the surrogate type. 100 surrogates of each type were generated, with preprocessing performed as discussed in Section 4.3. Correlation dimensions ν were calculated for each surrogate set, and the distribution was compared with the value for ν in the simulated chaotic data. The embedding dimension which was estimated for the simulated data was also used in the calculation of ν for the surrogates. As deterministic behaviour should result in a lower value for ν, the surrogate test is one-sided, with a lower value of ν in the real data than the 5th percentile of the surrogates considered significant. All considered surrogates were successful in rejecting the null hypothesis in this case, meaning that we can reasonably consider this system as nonlinear, as expected. Surrogate ν distributions for the chaotic system are shown in Fig. 21(A).
Next, we demonstrate how a linear system may appear to mimic some features of nonlinear systems, and demonstrate the requirement for surrogates to accurately determine their origin. High-order ARMA models can exhibit oscillations. For direct comparison with the chaotic signal, an oscillatory signal was generated using a third order ARMA model, s n = 1.625s n−1 + 0.284s n−2 + 0.355s n−3 + η n − 0.960η n−1 , where η n follows a Gaussian distribution with variance 1. The parameters in this model were estimated from the previously used chaotic Rössler time series. An example of a linear time series generated by Eq. (34) is shown in Fig. 20(B).
In this case, we naturally expect any tests for nonlinearity to fail to find it. This means that the discussed surrogates should fail to reject their null hypotheses, and there should be no significant difference between the range of ν calculated in the surrogates, and the ν calculated from the original data. As expected, all surrogates fail to reject the null hypothesis, as shown in Fig. 21(B). Data recorded in real systems are rarely a complete representation of the underlying system, for example it may be affected by the mechanism of measurement. This introduction of static nonlinearities may cause false rejections of the null hypothesis, or false indications of nonlinearity. It was for this case that AAFT surrogates were introduced, which account for this nonlinearity introduced into the system during measurement by ensuring that the resulting surrogates have the same amplitude distribution as the original data. This is in contrast to FT surrogates, which usually have a Gaussian distribution of values. ν was calculated for system (34) and its surrogates, but this time with an invertible transformation x n = (s n ) 3 applied to the time series. The results, shown in Fig. 21(C), demonstrate the failure of FT surrogates in this case, with them incorrectly identifying nonlinearity in the data, purely as a result of the measurement transformation. This demonstrates the benefit of the amplitude adjustment step in AAFT surrogates, but it should be noted that the condition that the data transformation should be invertible is very severe. When repeating the same test with a non-invertible transformation of the data x n = (s n ) 2 , AAFT surrogates fail, and erroneously detect nonlinearity due only to this transformation (see Fig. 21(D)).
From these results IAAFT, WIAAFT and IDFS surrogates appear to be more robust to these constraints. IAAFT are the fastest of these, and thus are ideal for testing data in which both the underlying dynamics and the effects of the measurement process are unknown beforehand.

Finding nonlinearity using Volterra polynomials
Volterra polynomials (see Section 6.3) can also be used to search for nonlinearity in data. The idea is that nonlinear signals will be better represented by nonlinear polynomials than linear signals. Therefore, the goodness of fit between the signal and polynomials of an increasing number of linear and then nonlinear terms is calculated. Again, to find the significance of the results, surrogates are required. The performance of the surrogates developed for the detection of nonlinearity were compared using Volterra polynomials in the same four cases as in the previous section: the ARMA model presented in Eq. (34), invertible and noninvertible transformations of this data, and the x component of the Rössler system presented in Eq. (33). Volterra polynomials were calculated for the original data, and for 100 FT, AAFT, IAAFT, IDFS and WIAAFT surrogates for each case. The null hypothesis was rejected, i.e. the system was considered to be nonlinear, if the increase in the goodness of fit of the polynomial to the data following the addition of nonlinear terms increased above the 95th percentile of the increase in the goodness of fit of 100 surrogates.
The results are shown in Fig. 22. As expected, FT surrogates only work well when there is no transformation applied to the ARMA time series. We can also see the expected behaviour of AAFT surrogates, incorrectly rejecting the null hypothesis when the measurement transformation is non-invertible. Surprisingly, they also rejected the null hypothesis when there was no measurement transformation applied. This could be due to the distortions that AAFT can introduce into the power spectrum. IAAFT2 surrogates gave the correct result in all cases, as did WIAAFT surrogates. As some of the incorrect results appear to be in the surrogates with variable power spectra, the test was repeated for IAAFT1 surrogates. The conclusions were the same as for IAAFT2 surrogates, however, the spread of the surrogate values was larger in the case of IAAFT1 surrogates. IDFS surrogates appear to perform well for all linear cases, including transformations, but fail to reject the null hypothesis for the chaotic Rössler system, which rules out their applicability in this case.

Finding oscillations in noisy data using MCSSA
Simulated data. Oscillations with time varying frequencies can often be overlooked when applying frequency analysis such as the Fourier transform, due to the broadening of the spectrum. Adding noise to the situation complicates the problem even further. Fig. 23 shows the results of MCSSA (see Section 6.5) on a sine wave with time-varying frequency contaminated by very strong AR(1) (ρ = 1, σ = 1, SNR = 0.017) noise. A small bump can be observed in the Fourier spectrum, yet without surrogates it is difficult to know whether this is the signature of a true oscillation. In this case, the wavelet transform does not show any obvious oscillations at 0.2 Hz. In contrast, the MCSSA method reveals a number of significant eigenvalues, related to the time-varying oscillation, around 0.2 Hz as expected. Embedding was performed using a 500 s window and a time delay of 1.
Application to EEG data. 1/f type spectra have been widely observed in neurological data, and are a result of the physical structure of neuronal networks and the limited speed of neuronal communication due to axon conduction and synaptic delays [33,226]. Here, we apply MCSSA to an EEG signal recorded from the occipital lobe (O2 on the standard 10-20 system) of a 50 year old female sitting at rest for 30 min. The data were recorded at 1000 Hz, detrended by subtracting a third order polynomial and bandpass filtering within the range 0.005-20 Hz, and resampled to 40 Hz for analysis. The first step was to estimate the parameters of the underlying noise model of the data, as described in Section 4.4. Due to the 1/f α -like appearance of the power spectrum of the data (see Fig. 24(B)) the background spectrum was calculated based on an estimation of α from the time series. Detrended fluctuation analysis (DFA) was applied to the time series to obtain a DFA exponent d, which is related to α by α = 2d − 1 [80]. Using this approach, α was estimated as 1.02, very close to pink noise (for which α = 1). 1000 surrogates were then calculated with this value for α and used in the analysis.
Embedding was performed using a 5 s window and a time delay of 1. Significant eigenvalues were found, and their dominant frequencies investigated in Fig. 24. Significant oscillations were found at around 1 Hz and at around 9-10 Hz. The former could be attributed to either the cardiac or delta brainwave activity, whilst the latter falls in the well known alpha frequency band, which from this recording location is likely to be Berger's classical posterior alpha rhythm [227] which is still the best known EEG phenomenon [228].

Synchronization of two coupled oscillators
In this section we investigate the phase synchronization between two simple coupled oscillatory systems and two chaotic systems using phase coherence (see Section 8.1).
Usually, phase coherence between two signals is compared to the distribution of values obtained for the phase coherence between one of the signals (the reference signal) and a set of surrogates of the other one (the surrogated signal). In the case of synchronization testing, the null hypotheses for different surrogates are that the surrogated signal is: RP -uncorrelated noise with some distribution; FT -independent stationary linear Gaussian noise; AAFT & IAAFT -invertibly rescaled independent stationary linear Gaussian noise; PPS, TS & CPP -independent signal with no intercycle correlations and timeshifted -independent signal with intercycle behaviour preserved. Note that the FT null hypothesis is equivalent to the null hypothesis that observed synchronization is just an artifact of bias in frequency, while IAAFT is for both frequency and distribution.
Both systems were simulated using 6th order Runge-Kutta with analytical coefficients given by the first table on p. 14 in [229]. All signals produced were preprocessed as previously discussed (Section 4.3). Where embedding is required for the surrogates, the embedding dimension is estimated using false nearest neighbours, and the same embedding dimension is used for the original data and the surrogates.

Simple oscillators
We begin with the bivariate case of two coupled oscillators. Here, we consider two coupled phase oscillators, described by the Kuramoto model (as described in detail in [230]) as follows: where φ 1,2 are phases of the respective oscillators, ω is their common frequency, 2∆ω is the detuning (the difference between their frequencies), ϵ is the coupling strength and σ is the noise multiplier. η 1,2 (t) are two independent realizations of white Gaussian noise such that ⟨η i (t), η j (τ )⟩ = δ(t − τ )δ ij . In general, one has some functions f 1,2 (φ 1 , φ 2 ) instead of the sin(φ 2,1 − φ 1,2 ) in Eqs. (35) in addition to different coupling strengths ϵ 1,2 and dynamical noise intensities σ 1,2 , meaning that the influence of each oscillator on the other may not be equivalent and may be quite complicated. For simplicity, here we consider the case of equivalent bidirectional coupling, and equivalent noise intensities for each oscillator.
To investigate phase synchronization in this system, the relative phase difference ψ = φ 2 − φ 1 was investigated.
Subtracting the first equation from the second in Eqs. (35) gives: For ϵ ≥ ∆ω this equation has a stable fixed point ψ = arcsin ∆ω ϵ . This means that for ϵ ≥ ∆ω phases φ 1 (t) and φ 2 (t) are completely synchronized, although this will be slightly altered with the addition of noise. For 0 < ϵ < ∆ω the system is only partially synchronized, while for ϵ = 0 the phases are completely independent.
We now compare the performance of different surrogates in the case of the underlying phase dynamics described by Eqs. (35). We consider the case for both the fixed frequency ω = 2π and the variable frequency ω = 2π + 0.25 sin(0.2π t), ∆ω = 0.2 and σ = 0.01. The system was integrated for 75 s with 0.01 s time step. The parameters were chosen such that the phase coherence will be non-zero even for zero coupling, to mimic a situation where the presence of synchronization may be falsely indicated.
The system was integrated for different coupling strengths ϵ ranging from 0 (completely independent signals) to 0.25 (complete synchronization begins at ϵ = 0.2). To eliminate the transient period, we only consider the last third of the signals (25 s, 2500 data points). In addition to the dynamical noise, all signals were contaminated with observational noise with noise-to-signal ratio (NSR) = 0.1. To ensure clear comparisons between simulations, the same realizations of dynamical and observational noise were added for each ϵ.
We consider as a simple example that the signals are sine functions of the phases f 1,2 (φ 1,2 ) = sin(φ 1,2 ). The phase coherence between the oscillators and its dependence on coupling strength ϵ is shown in Fig. 25. Phase coherence was considered significant if it was above the 95th percentile of 100 surrogates of the second signal, as determined by rank ordering.
The results are shown in a form which we will refer to as a 'surrogram' (see Fig. 25). For each coupling strength, the phase coherence (see Section 8.1) was calculated between the numerically simulated phase oscillators described in (35) with a sine function applied. 100 surrogates of the second oscillator were generated for each of the 9 surrogate types shown in Fig. 25. Phase coherence was then calculated between the first phase oscillator and the surrogates of the second oscillator to give a distribution of surrogate phase coherence values. For each coupling ϵ and each surrogate type, the real phase coherence was compared to the distribution of 100 surrogate values and deemed to be significant if the real coherence value was above the 95th percentile of the surrogate distribution. If significance was found, i.e. the null hypothesis was rejected, it was marked on the surrogram as a grey diamond. Consecutive rejections are marked as grey lines. This provides a map of the performance of each surrogate type. Within the same plot, the dependence of the phase coherence of the simulated data on the coupling strength ϵ is plotted to show the real values of phase coherence in the system. From Fig. 25(A) it can be seen that for the non time-varying case, out of FT, AAFT and IAAFT surrogates, only AAFT indicate significance in the fully synchronous regime. This is related to the fact that when two sinusoids are completely synchronized, as they are in this case above values of ϵ = 0.2, it is almost impossible to say whether they are really interdependent or whether they have the same frequencies by chance. The AAFT significance at this perfect coupling is simply related to the fact that AAFT surrogates do not reproduce pure harmonics, contrary to FT and IAAFT, highlighting the serious disadvantage of AAFT surrogates that they may indicate significant synchronization even at zero coupling. In this case, IDFS surrogates only show significance for very high values of ϵ, suggesting they may not be as useful for the study of weak interdependence.
It can also be seen from the figure that CPP surrogates also indicate significance in the completely synchronized regime, and also for some weaker coupling strengths. The only other surrogates to indicate significant synchronization in this case are PPS. These results show how varied and unreliable the results can be when trying to test whether perfectly coherent sinusoids are synchronized. Out of all methods, time-shifted surrogates appear to have the worst performance. To test a more realistic case, shown in Fig. 25(B), the natural frequency ω was allowed to vary according to ω = 2π + 0.25 sin(0.2π t). In this case, significant coherence is indicated by all surrogates after ϵ = 0.2, and less overall before this threshold.
When using surrogates in bivariate systems, the question arises of which signal to calculate surrogates from. Consider two signals to be tested using surrogates. We name the first signal the reference signal, and the second signal the surrogated signal. It should be noted that if the reference signal was linear yet the surrogated signal were to be highly nonlinear, for example a sawtooth wave, the FT, AAFT and IAAFT methods would reject the null hypothesis even if the signals were completely independent, purely because of the nonlinearity present in the signal. Therefore we cannot conclude anything about synchronization in this case. Because of this, it is recommended that given two signals, the significance levels of phase coherence based on FT, AAFT and IAAFT surrogates should be concluded from calculating surrogates from the most linear signal of the two. This can be decided by extracting the phases of both signals, and checking which extracted phases most closely resemble their respective original signals. If both signals are highly nonlinear, these methods will not be suitable, and surrogates which include possible nonlinearity in their null hypothesis should be used. CPP, time-shifted, PPS and twin surrogates should not be affected by these issues.

Chaotic oscillators
We now consider a system of chaotic oscillators. A positive Lyapunov exponent (see Section 6.6) is a condition for chaos and represents sensitivity to initial conditions. Each chaotic oscillator has at least one zero Lyapunov exponent, some number of negative ones, and at least one positive one. For simplicity, consider two three dimensional chaotic oscillators (such as the Lorenz or Rössler systems), both having one positive, one negative, and one zero Lyapunov exponent. The uncoupled system will then have two positive, two negative, and two zero exponents. Two zero Lyapunov exponents correspond to limit cycles in the oscillators and represent their oscillatory behaviour. When we couple these systems, the Lyapunov exponents of the joint system change depending on coupling strength. When increasing the coupling strength, one of the zero Lyapunov exponents of the joint system may become negative (ϵ = 0.04 for the Rössler system below, and ϵ = 0.35 for Lorenz).
When this happens, it means that the two limit cycles corresponding to each chaotic oscillator are reduced to one, since there is now only one zero Lyapunov exponent. The oscillatory behaviour of the two systems becomes entrained, and they oscillate in phase, thus having a joint limit cycle, deviations from which exponentially decrease. This describes the transition to phase synchronization in chaotic systems [178].
Increasing the coupling strength further, the remaining zero Lyapunov exponent may also become negative, while one of the positive Lyapunov exponents becomes zero (ϵ = 0.12 for the Rössler system below, and ϵ = 0.75 for Lorenz), indicating a transition to so called lag synchronization [231]. In this regime, one system mimics the other, but with a time lag. Increasing the coupling strength even further reduces this lag and moves the system towards complete synchronization, which may be interrupted by regions of periodic behaviour in which the system is not chaotic. In this section we investigate the performance of surrogates, again combined with the phase coherence, in the well known Rössler and Lorenz systems. Phase synchronization of chaotic oscillators was discussed for the example of coupled Rössler systems in [178]. We consider the system of two bidirectionally coupled Rössler oscillators, given by the set of equations [178,231]: ⎧ ⎨ ⎩ẋ 1,2 = −(ω 1,2 )y 1,2 − z 1,2 + ϵ(x 2,1 − x 1,2 ) y 1,2 = ω 1,2 x 1,2 + ay 1,2 with a = 0.165, b = 0.2, c = 10 and ω 1,2 = ω ± ∆ω where ω = 0.97 and ∆ω = 0.02, as used in [231] to establish mainly chaotic behaviour of the joint system. The system was integrated for 5000 s with time step 0.1 s for coupling strengths ϵ ranging from 0 to 0.25 with a step of 0.005. An example of the Rössler oscillator system (37) and its x components for zero coupling and ϵ = 0.05 is shown in Fig. 26. To compare surrogate performance, we consider the evolution of the system during the last 200 s to discard transients periods and establish non-vanishing values of phase coherence at zero coupling. The signals were contaminated with 0.1 NSR observational noise. Phase synchronization is investigated between the x components of the system, with phases extracted using the Hilbert transform. As in the previous case, phase coherence was calculated for a range of couplings in the simulated data, and in all surrogate types. For clearer comparison, initial conditions and observational noise remained the same for all cases. Fig. 27(A) compares the performance of different surrogate types for this case.
This test was repeated on the coupled Lorenz system, given by: where σ = 10, ρ = 28, β 1,2 = β ± ∆β where β = 8/3 and ∆β = 0.01. The system was integrated for 5000 s with time step 0.01 s for coupling strength ϵ from 0 to 1.5 with a step of 0.05. To evaluate the performance of different surrogates, we consider the last 75 s of the z-components of system (38), contaminated by 0.1 NSR noise. Phases were extracted using the Hilbert transform. Fig. 27(B) compares the performance of different surrogate types for this case. For the Rössler case and Lorenz case, all surrogates performed quite well. For the Lorenz system, contrary to the Rössler, transition to phase synchronization is not accompanied by such rapid growth of phase coherence. However, the picture almost stays the same, which proves that the surrogates do not only indicate significance for large phase coherence values, but can 'feel' transitions in the system dynamics. As in the simple case, it should be noted that for FT-based surrogates the nonlinearity in the signals may cause rejection of the null hypothesis even with no coupling.
Whilst all the surrogates shown in this comparison appear to be applicable to synchronization, at least when using the measure of phase coherence, it should be noted that PPS, CPP, TS and time-shifted surrogates actually test the null hypothesis of independence rather than phase synchronization. Some of them do indeed indicate interdependence before the synchronization level, but in theory should indicate significance for all finite couplings. This demonstrates that they may not have very high power when searching for weak coupling.

Phase coherence in real biological data
Here, we will compare the performance of different surrogate types for the calculation of wavelet phase coherence. When using a complex wavelet such as the Morlet wavelet in the continuous wavelet transform, the resulting coefficients will also be complex, and provide both amplitude and phase information at each time and scale. The instantaneous phase information from the wavelet transforms of two signals can be extracted and used to monitor their phase difference over time, and may reveal phase relationships between them.
Wavelet phase coherence is calculated by first extracting the instantaneous phases φ 1k,n and φ 2k,n at each time t n and frequency f k for both signals and finding their relative phase difference ∆φ k,n = φ 2k,n − φ 1k,n . The sine and cosine components of the phase differences are then calculated and averaged in time, and the wavelet phase coherence defined as [46]: The wavelet phase coherence C φ (f k ) will have a value between 0 and 1, with a value of 0 corresponding to a phase difference continuously changing in time, and a value of 1 indicating a phase relationship which remains constant in time. The phase coherence described here is the same quantity described in Section 9.2, but with a dependence on frequency. As for the phase coherence discussed previously, in reality wavelet phase coherence can have a range of values for which it is difficult to conclude interdependence without the use of surrogates. Calculation of wavelet phase coherence also highlights another issue, that the coherence between even two completely independent signals will increase at lower frequencies and tend towards 1. Thus, surrogates are also required to account for this bias.
Cardiorespiratory coherence. Respiratory sinus arrhythmia (RSA) is a well known phenomenon in which the heart rate is modulated by the breathing. Using wavelet phase coherence analysis, significant coherence was demonstrated between the instantaneous heart rate (IHR) and breathing in humans [27], and this coherence was found to decrease with age. This study [27] used intersubject surrogates, but as discussed, there is not always enough data available for this to be a viable option. Here, we compare some of the discussed surrogate types in an example cardiorespiratory system. Electrocardiogram (ECG) and breathing signals were recorded for 40 min in a 26 year old male at a sampling frequency of 1200 Hz. ECG measures the electrical activity of the heart as it is beating, while the respiration signal can be recorded from the mechanical movement of the chest. The ECG was recorded using a 3 lead setup, and respiration was recorded using a piezoelectric belt fastened around the chest of the subject. The ECG was then resampled to 40 Hz and the instantaneous heart rate (IHR) extracted from the wavelet transform using ridge extraction (see Appendix A.3). The IHR and respiration signals were then both resampled to 10 Hz for analysis. Their wavelet phase coherence was calculated for a frequency range between 0.005 and 2 Hz. The wavelet phase coherence between the IHR and 100 surrogates of the respiration signal were then calculated for different surrogate types, as shown in Fig. 28. Significance levels were calculated as the 95th percentile of the surrogates. Intersubject surrogates were calculated from 100 mismatched IHR -respiration pairs from the same data set. Fig. 28 does not show large differences between the significance levels for each of the 10 compared surrogate types, except WIAAFT surrogates which indicate a higher significance level than the other surrogates at the respiration frequency, and a lower significance level at frequencies below 0.01 Hz. This may be due to the similarity of the WIAAFT surrogates to the original data not completely removing the correlations between the two signals. A number of other surrogates also show a slightly lower threshold at lower frequencies (see Fig. 28).

Inferring the nature of coupling
When trying to extract information from bivariate data arising from two systems, care must be taken to formulate the correct null hypothesis. In the previous section, wavelet phase coherence was evaluated between time series. A significance level based on there being no coherence between data was used. However, it should be emphasized that coherence found above this level cannot provide any information about phase coupling or interactions that may or may not be present in the system, as coherence can be observed in completely independent systems if their phase difference does not change over time. Therefore, to infer this extra information from time series, the null hypothesis must be altered to reflect this.
Surrogate data can be applied to the problem of inferring not only that two oscillators are in sync, but also how their phases are coupled by extracting information about their coupling functions [177] over time using dynamical inference (see Section 8.4). A detailed explanation and a MatLab toolbox for the implementation of a DBI method can be found in [186]. This method can utilize the phase information from two or more systems to infer the coupling strength, direction of coupling, and whether the system is synchronized.
In particular, the inferred coupling strength provides a measure of the interaction between systems. Like the measures discussed previously, it is impossible to conclude whether an obtained coupling strength is significant without the use of surrogates.

Coupling in simple phase oscillators
Many biomedical signals contain more than one oscillatory mode of interest. One example of this are EEG signals, which can be separated into five different frequency bands, corresponding to different processes in the brain [33]. The characteristics of these oscillations can provide a lot of information about brain function [232]. It has also been shown that oscillations observed in the same signal may be phase coupled, i.e. the phase of one oscillator may influence the phase of the other [233]. This type of coupling can be investigated using dynamical Bayesian inference (DBI). From the time evolution of the phases of two oscillators, DBI can infer the strength and the direction of the coupling, and how it varies in time. Like many other methods that we have discussed, the absolute value of coupling strength obtained in this way may be difficult to interpret without the use of surrogates to determine a baseline value that would be obtained in the absence of coupling.
In reality, what will be measured is not the phases of the oscillations, but some function of them. Thus, more realistic signals were created by applying the cosine function to the phases, As discussed, coupled oscillators regularly manifest within the same signal, thus the final simulated time series consists of the sum of these modes, Considering our time series s(t), the first step is to identify any oscillatory modes and their frequency ranges. As we already know their locations, we can bandpass filter within the range [ω i − ω i 2 ; ω i + ω i 2 ] for each oscillator. If the frequencies were not known beforehand, time-frequency analysis or nonlinear mode decomposition could be used to locate and extract them (see Appendix A.3). Once the two separate oscillations have been extracted from the time series using Butterworth filtering, the next step is to calculate their phases in preparation for the dynamical Bayesian inference. The phases of each mode were extracted using the Hilbert transform (see Appendix A.3.1).
The two extracted phase time series were then used as input into DBI in order to evaluate its performance in inferring the time-varying coupling strength of a 2 . Fig. 30(A) shows the successfully inferred coupling strength of a 2 compared to the numerical coupling and the coupling from the raw phases. The method not only infers a correct value of the coupling strength but can also follow its time evolution.
When the coupling strength is unknown prior to the inference, surrogates must be used to generate a level above which the coupling strength can be considered as significant. To test the null hypothesis of no coupling between oscillators, we compared the performance of three surrogate types: cyclic phase permutation (CPP), Fourier transform (FT) and wavelet iterative amplitude adjusted Fourier transform (WIAAFT) surrogates.
The most natural choice are CPP surrogates, which work directly with the phases, and remove couplings between successive cycles in the time series. FT surrogates randomize the phase information in the original data, so should also remove any phase coupling. The performance of WIAAFT was also evaluated to test whether their extra constraint of matching temporal characteristics of the data has any effect on their ability to test for phase coupling.
Surrogates were calculated from the original time series s(t) following preprocessing as discussed in Section 4.3, which reduces the available data points, and results in fewer windows for which the coupling is inferred. This explains why FT and WIAAFT (blue and pink shades) are truncated before CPP (green) in Fig. 30.
FT and WIAAFT surrogates were calculated directly from s(t), whilst CPP surrogates were calculated from the extracted phases obtained from bandpass filtering and Hilbert transforming s(t) in the frequency ranges of interest. The phases of the FT and WIAAFT surrogates were extracted in the same way as for the original time series. Fig. 29(A) shows the time series resultant from the numerically generated phases in thin dashed lines and the output of the reconstruction and filtering procedure in solid lines. It can be noted that the two almost overlap. This will not be the case for more complex waveforms in real data. The surrogates generated from the original time series are shown in Fig. 29(B)  for FT and Fig. 29(C) for WIAAFT. Both methods aim to preserve the spectral content of the signal, and it can be seen in the bottom panels how the frequency modulation of x 2 is, in both cases, destroyed and converted into a more complicated waveform. The resultant modulated amplitude for the two surrogates is not relevant for the aim of this computation, as the input of the dynamical Bayesian inference is the phase.
Once two sets of phases had been obtained for each surrogate type, they were used as input into DBI and the coupling strength was inferred. In the case of strong coupling (a 2 = 6), for all surrogates the coupling strength was much lower in the surrogates than in the real coupled system, demonstrating that they were all successful in rejecting the null hypothesis of no coupling between oscillators in this case (see Fig. 30(A)). The significance level was considered as the 95th percentile of the range of coupling strengths calculated for each surrogate type. CPP provided the lowest threshold, while FT and WIAAFT generated comparable average levels of surrogates.
In order to investigate this difference further, we repeated the procedure on the same system with weaker direct coupling, i.e. a 2 = 0.6. Fig. 30(B) shows that CPP surrogates still provide the lowest threshold, and validates the inferred coupling strength at all times. On the contrary, both FT and WIAAFT generated a threshold comparable to the numerically set value of the coupling strength, and even higher for several time-windows. This result suggests that CPP surrogates are the most appropriate method to be used in this case.
To validate the result in the absence of coupling, the same procedure was repeated on the phases from the system with a 1 = a 2 = a 3 = a 4 = 0. From this simulation, we would expect the inferred coupling strength from the original data not to differ significantly from the coupling strength inferred in any of the surrogate types. Fig. 30(C) shows that this is indeed the case.

Cardiorespiratory coupling
The combination of dynamical Bayesian inference and surrogates for the investigation of coupling between oscillators was also applied to real experimental data. We again investigate the coupling phenomenon between the heart rate and the breathing known as respiratory sinus arrhythmia (RSA). In Section 9.2.3 we demonstrated that there is significant coherence between the instantaneous heart rate and the respiration signals in the respiration frequency range. However, as also discussed, the presence of coherence does not necessarily mean that there is coupling present in the system. Thus, here we use dynamical Bayesian inference to study this well known phenomenon.
We consider an electrocardiogram (ECG) signal as x 2 and a respiration signal as x 1 . Filtering was applied as in the previous section, within the frequency ranges [0. 6 2] Hz for the ECG and [0.15 0.6] Hz for the respiration.
Due to the complex waveforms of the ECG and respiration signals, the filtered signals differ from the original ones (see Fig. 31 (A)&(B)), and the phases are recovered consistently. Coupling strength between the filtered ECG and respiration signals was inferred using DBI for two young healthy subjects.
To find whether the obtained coupling strength is significant, surrogates were employed to test the null hypothesis of no coupling between x 1 and x 2 . The inferred coupling strengths and their time evolutions are shown in Fig. 32.
CPP surrogates were computed from the extracted phases from each signal.
The performance of WIAAFT and FT surrogates was tested on the original data, preprocessed as discussed in Section 4.3 ( Fig. 31(C) to (F)).
Similarly to the numerical simulation, CPP surrogates (in green) generated the lowest threshold. FT (in blue) and WIAAFT surrogates (in pink) generated similar thresholds for both subjects, and invalidated the coupling strength for most of the time-windows. In the light of the result shown for weak numerical coupling in Fig. 30(B), this result can be interpreted as FT and WIAAFT failing to fully destroy the coupling in the data, rather than CPP generating a too low threshold.
In conclusion, we showed that it is not assured that the use of WIAAFT and FT surrogates is consistent with a null hypothesis of uncoupled systems, especially for weak coupling, and it is therefore not recommended for this specific application. For validating phase-coupling between oscillators, it is crucial to generate surrogates which fit the null hypothesis of being uncoupled. Within the selections of surrogates tested in this section, CPP surrogates gave the most consistent results. The time evolution is randomized, while preserving the within-phase feature of the signals. The approach of FT surrogates seemed promising too, but by preserving the spectra there is the possibility of maintaining traces of coupling in the reconstructed surrogates. WIAAFT, which willingly mimic the time evolution of the signals, were not appropriate for this specific application, as predicted.

Wavelet bispectrum
Here, we will investigate nonlinear coupling between two time series using wavelet bispectrum analysis. As mentioned above, surrogates should preserve all properties of the signal apart from the one that we are testing for. We consider bispectra based on the continuous wavelet transform, therefore our wish is to destroy coupling between wavelet coefficients at different scales, while maintaining a similar time-averaged wavelet power spectrum. We investigate Fourier transform (FT), wavelet iterative amplitude adjusted Fourier transform (WIAAFT [76]), wavelet time-shifted (Wtshift, as described by Breakspear [108]) and time-shifted (tshift) surrogates. A peak will be considered as significant if its magnitude is higher than the 95th percentile of the surrogate values, obtained using 20 surrogates. Initially, 150 surrogates were calculated, but it was found that using more than 20 did not significantly alter the results. The optimal surrogate type would indicate the significant peaks in the precise location corresponding to the coupling while not indicating the presence of any other spurious peaks. FT and WIAAFT surrogates destroy the nonlinearities within the signals, while Wtshift surrogates only disrupt correlations between wavelet coefficient levels. tshift surrogates preserve all nonlinearities.
As an example, we study a coupled pair of Poincaré oscillators described by: where γ is a measure of nonlinear coupling strength (from system 2 to system 1), ω i = 2π f i are the basic oscillator frequencies, α is the relaxation parameter, and a is the radius of the limit cycle for the internal dynamics of each system. We investigate signals x 1 and x 2 with addition of 1/f noise, where X 1 , X 2 are the studied signals, and χ 1 , χ 2 are independent realizations of 1/f noise with standard deviation 1. The chosen parameter values were f 1 = 0.86 Hz, f 2 = 2.5 Hz, α = 1, γ = 2 and a = 1. The noise strength σ was varied between 0 and 1.2. The system was integrated with a Heun scheme, using a timestep of 0.1 ms. The total simulation time was T = 28 s (with the initial 8 s subsequently removed) chosen to represent a minimum of 10 cycles of lower frequency (∼11 s) but also to not to overlook the possible lower harmonics.
The signals analysed were X 1 (t) and X 2 (t) downsampled to fs = 50 Hz to allow observation of 1/10 of the cycle of the faster oscillation (25 Hz), but also to not overlook higher harmonics. Particular attention should be paid to preprocessing, as discussed in Section 4.3. The wavelet cross bispectrum B X 2 X 1 X 1 was computed using the Morlet wavelet with central We are primarily interested in the spectral content that appears due to the presence of nonlinear coupling. Here, there are two overlapping effects: system 2 is modulating the frequency of system 1 with strength γ , and at the same time is also quadratically forcing system 1. This results in the appearance of the new spectral peaks 2f 1 , 2f 2 , f 1 − f 2 , f 1 + f 2 [213].
Results are shown in Fig. 33, with frequencies for X 1 on the horizontal axis and frequencies for X 2 on the vertical axis. In each figure, the blue circle corresponds to the coupling between f 1 and f 2 ; the red circle, between f 2 − f 1 and f 2 ; and the black circle, between f 1 and f 1 .
The detection of nonlinear coupling using bispectra is affected by measurement noise. Fig. 33(A) shows the bispectra for different values of the measurement noise intensity σ . The main peak is harder to detect as the level of noise is increased, and more harmonic peaks appear and form a characteristic ''cross'' with stronger coupling. Fig. 33(B-E) shows the resulting positive values after the 95th percentile of the different surrogate types is subtracted. FT surrogates (Fig. 33(B)) were not capable of detecting a main peak (f 1 , f 2 , marked in blue). Similar results were obtained with IAAFT surrogates (data not shown). WIAAFT (Fig. 33(C)) and Wtshift (Fig. 33(D)) surrogates were successful in detecting the main peak when σ < 1. In the case where the noise level was higher, the main peak (blue) was not detected and other significant peaks appear, belonging to the harmonics, shown with red and black circles.
For all tested surrogate types, increasing the number of surrogates beyond 20 does not improve the accuracy of detection of nonlinear coupling. Although the most reliable results were obtained by WIAAFT and Wtshift surrogates, it is unlikely that any type of surrogate discussed would be reliable when the level of measurement noise becomes high (σ > 1). Visual inspection of the original bispectrum (Fig. 33(A), σ = 1.2) in the case of a high level of measurement noise is less misleading than applying any type of surrogate.

Discussion
Surrogates, combined with appropriate discriminating statistics, provide a 'statistical zero', a threshold that is calculated from a range of data sets that definitely do not possess the property that is being investigated. Each surrogate has its own rules and issues, but in general we recommend that data should be preprocessed to match the ends and first derivatives (except of course in noise surrogates), significance levels should be determined using the rank method, to ensure there is no assumption of normality in the surrogate distribution, and the least nonlinear signal should be the one from which the surrogate is calculated, especially in cases where the extraction of phase is necessary.
In the first half of this work, we have reviewed many of the surrogate methods currently available for three general scenarios (1) finding dynamics in noisy data, (2) finding nonlinearity in data, and (3) investigating interactions between systems. The second part of the paper consists of original investigations into the performance of different surrogates types for these three important areas, specifically testing for nonlinearity, finding synchronization in simple and chaotic systems, and inferring coupling parameters from interacting systems.
Tests for nonlinearity revealed that not all surrogates for testing for nonlinearity are appropriate for use with all discriminating statistics. The results can be affected by nonlinear measurement transformations, and also by how well the surrogates meet the constraints of the data. The investigation into oscillations in noisy data using MCSSA demonstrated how significant oscillations can be successfully located with the method, by applying MCSSA to a numerically simulated oscillatory signal buried deep in correlated noise. The method was also applied to an EEG signal recorded from the occipital lobe of a human brain, and significant oscillations were observed.
The performance of many different surrogate types for the determination of significance levels for synchronization, as measured using phase coherence, was investigated in both simple phase oscillators, and coupled chaotic systems. The results clearly demonstrated the importance of time-variability in the phases of oscillators when trying to observe synchronization between them, with many of the surrogates failing to reject the null hypothesis of independence even when the system was fully synchronized. This is because the oscillators were almost identical over time, and thus their respective surrogates had a high level of coherence with the original data. Observing the time-varying case, we can see that the surrogates are all successful in the identification of synchronization in the synchronized system of simple sinusoidal oscillators.
Moving on to chaotic oscillators, all tested surrogates were relatively successful in indicating synchronization at the expected coupling strengths between both the coupled Rössler and Lorenz systems. It should be emphasized that none of the surrogates actually test for complete phase synchronization, but for the independence of signals, which is why they may show significance before the system is fully synchronized. Despite the apparent success of the surrogates in the demonstrated cases, for the purposes of testing for synchronization, application of FT, AAFT or IAAFT surrogates may not be ideal, as they can indicate a rejection of the null hypothesis even with no synchronization, if the signal is highly nonlinear. Therefore, surrogates which allow for nonlinearity in signals are more applicable and reliable in this case. Overall, the surrogates which meet this criteria, and have some of the lowest computational costs are CPP and time-shifted surrogates. However, timeshifted surrogates may have the problem that they maintain some correlations between signals if the shift is not sufficiently large, whilst CPP surrogates can only be constructed if the phase of the oscillation can be extracted.
Wavelet phase coherence was also used as an example to test the performance of different surrogate types, using respiratory sinus arrhythmia as an example. This is a case where it is demonstrated that surrogates can be used to identify bias in methods. For the evaluation of the performance of surrogates for testing coupling strength between oscillators, FT, WIAAFT and CPP surrogates were compared in a system with known time-varying coupling strengths. The coupling strengths were extracted using dynamical Bayesian inference, with the same preprocessing and inference also applied to different surrogate types. As the only surrogate to correctly identify the significant coupling at all coupling strengths, CPP surrogates again emerged as an ideal choice. At higher coupling strengths, all surrogates were successful in rejecting the null hypothesis of independence between systems.
Application of surrogates to wavelet bispectrum analysis demonstrated that although improved performance can be obtained by using wavelet surrogates rather than FT surrogates alone, dynamics can still be very difficult to characterize when there is a high level of noise in the system, or very complex interactions. This highlights that care should be taken in the application of surrogate data testing, especially in noisy or time-dependent systems. f 2 − f 1 , f 2 ; black: f 1 , f 1 ). As can be seen, because of detecting the main peak (blue circle: f 1 , f 2 ) the best performance was exhibited by WIAAFT (C) and Wtshift (D) surrogates. However reliable detection above σ = 1 (with surrogates presented) was not possible.
There are many other surrogate types available than those discussed here, and the field continues to expand. As their use grows in popularity, surrogates to test increasingly specific null hypotheses are emerging, based on existing ideas, or completely new concepts. For example, surrogates for epileptic seizure prediction were introduced by Andrzejak and Kreuz et al. [234,235] and developed further in [236]. The most general approach for the creation of surrogates which preserve data distribution was introduced by Schreiber [237]. This method provides the possibility of generating surrogates with any constraints, by choosing the respective cost function, the minimum of which is equivalent to satisfying the given constraints. This minimization of the cost function is performed over a set of all possible permutations of the data by use of a technique known as simulated annealing. However, this powerful technique does have some drawbacks. In addition to some arbitrariness in the choice of the cost function, the method is also very computationally expensive, with a computation time of a few days per surrogate in some cases. There is a package available for the implementation of this and many other surrogate methods and discriminating statistics, written in C and Fortran, known as the TISEAN package, as introduced and discussed in [4]. Another type of surrogate known as attractor surrogates (AS) were introduced in [238], and were designed to preserve both short time correlations and the shape of the attractor in 2D phase space, but the authors concluded that for their purpose, the investigation of unstable periodic orbits in noisy, nonstationary data, AS did not provide any advantage over random permutation surrogates.
Surrogates have also been used recently in spatial applications. Postnov et al. [239] used 2D surrogates to find oscillatory patterns in laser speckle data, and Sheppard et al. [240] applied multivariate spatial FT surrogates of wavelet coherence to find relationships between populations in ecology. The latter also presented an algorithm that improves the efficiency of the calculation of wavelet coherence surrogates. Wiedermann et al. [241] recently introduced spatial network surrogates which preserve either the global or local distributions of link lengths between nodes, and used them to classify network types. Spatio-temporal propagation patterns in neuroscience and climatology were recently investigated using a new type of surrogate named spike order surrogates [242]. Spike order surrogates maintain the coincidence structure of spike trains, but the spike order is destroyed by swapping pairs of spikes.
With applications spanning temporal and spatial domains, the powerful tool of hypothesis testing with surrogate data should be considered not only in physics, mathematics and biology, but in any field dealing with dynamics and interactions between systems. This approach should be an integral part of any conclusions based on statistical measures, especially when the results may be affected by the characteristics of the data or the analysis method itself.
We have provided a MatLab toolbox for the calculation of many of the discussed surrogates. As is stressed in [4], we must also emphasize that use of our software package cannot be treated with a 'black-box' approach, and any methods used should be well understood before application. if multiple surrogate methods are suitable, and one is much faster. To demonstrate the speed of computation, surrogates were calculated using each of the methods shown in Fig. 34. The surrogates were calculated from a time-varying noisy sinusoid described by sin(2π (t + 0.5 sin(2π t/10))/3) with additive Gaussian white noise with NSR = 0.1, with sampling frequency 100 Hz. Fig. 34(A) shows the dependence of computation time on the length of the signal in points, L, and (B) shows the dependence on the number of surrogates. Calculations were performed using MatLab 2015a on Windows 10 (64-bit, 3.4 GHz Intel Core i7). Generally, increasing signal length and surrogate number will increase the computation time. Twin surrogates are an exception to this rule, as their long computation time is due to the initial finding of twins, which only needs to be performed once, after which the surrogates can be calculated quickly, meaning that the computational speed is almost independent of the number of surrogates (see Fig. 34(B)). However, as can be seen in Fig. 34(A), for large data sets, the calculation of twin surrogates is not possible, due to the large amount of memory required.
The computation time of WIAAFT surrogates may be improved by finding a more efficient matching algorithm for each scale, or applying the updated PWIAAFT algorithm discussed in [76].

A.3. Phase extraction
The phase of a signal containing only one oscillation can be extracted using the Hilbert transform (see below). For more complex signals, time-frequency analysis (see Appendix A.5) can be used for the separation of oscillatory modes, and to extract their instantaneous phase using either ridge extraction [243][244][245] or nonlinear mode decomposition [21]. For phase construction to be meaningful, it is important that the time series is oscillatory [78].

A.3.1. The Hilbert transform
The Hilbert transform, devised by the mathematician David Hilbert, converts a real signal into a complex signal known as its analytic representation, and takes the form [29]: which is equivalent to the convolution of the time series f (t) with 1 π t . The analytic signal recovered using the Hilbert transform can be expressed as: x a (t) = x m (t)e jφ(t) , (45) where the instantaneous amplitude x m (t) and phase φ(t) can be extracted as the modulus and the argument (polar angle) of x a , respectively.
This representation specifies the amplitude and phase of the signal as a function of time [246]. Information obtained via the Hilbert transform retains the same sampling frequency as the original data. The Hilbert transform does not perform well in cases in which there are many oscillations present in the data, or when the data are highly nonstationary. Thus, many applications employ bandpass filtering before applying the Hilbert transform. An example of the separation of the amplitude and phase of a signal using the Hilbert transform for a simple oscillatory signal is shown in Fig. 35.

A.3.2. Phase vs. protophase
Throughout the text, when referring to the phase of oscillators, we assume that it is growing uniformly, and gains 2π with each cycle, as it would during simple harmonic motion. This means that the system rotates on a limit cycle at a constant speed. However, this assumption is not always valid, especially in the case of nonlinear oscillators, for which the phase generally does not increase uniformly. In these cases, extracted phases may differ depending on the method used, and may in fact provide the protophase rather than the phase. The protophase still gains 2π with each cycle, but does not necessarily grow uniformly, preserving more complex intra-cycle phase evolutions. In order to obtain an invariant phase of nonlinear systems, a protophase-to-phase conversion should be applied [247,248]. An exception to this rule is when estimating conditional mutual information as introduced by Paluš (see Section 6.7), which is invariant with respect to monotonous data transformation [private communication, 2017].

A.3.3. Nonlinear mode decomposition
Nonlinear mode decomposition (NMD) is a recently developed technique for the extraction of nonlinear modes from time series [21]. It combines many different techniques in order to accurately reconstruct nonlinear waveforms. It uses time-frequency analysis methods to extract the fundamental oscillation and its harmonics, and surrogate data to identify whether these harmonics are real. The oscillation can then be reconstructed and subtracted from the original time series, and the process repeated until all oscillatory modes have been located.

A.4. Filtering
As discussed above, the Hilbert transform only has clear meaning for narrow band signals [249], and this is used as justification for filtering in the frequency interval of interest before trying to obtain instantaneous amplitude and phase from a signal. Hilbert phase only completely represents the actual phase for a pure sinusoid. In other cases, additional components may distort the extracted phase in some way, but generally the variation in period will be preserved. Le Van Quyen et al. concluded that the Hilbert transform is equivalent to the wavelet transform for the extraction of instantaneous phase from time series [249], but bandpass filtering was applied before the application of both methods.
Filtering increases the risk of information loss, especially in systems with time-varying dynamics or harmonics, so should be avoided whenever possible. Instantaneous phase, frequency and amplitude information can be extracted directly from clear modes in the wavelet transform using ridge extraction [243][244][245] or nonlinear mode decomposition [21].
Naturally, if filtering is to be used for hypothesis testing using surrogates, the exact same procedures must be applied both to the original data and to every realization of surrogate data [250], and surrogates should always be calculated before filtering to ensure that any differences are not merely as a result of the filtering.

A.5.1. Windowed Fourier transform
When assessing the frequency content of a signal, the first method to be applied is usually the Fourier transform. However, for signals arising from many real systems, the assumption of stationary of the data does not hold. If the data contain oscillations with frequencies which vary in time, the Fourier transform will provide no information about when in the recording these variations occurred, and will instead provide one 'blurred' peak. In order to address this problem, the windowed Fourier transform (WFT) was developed. The WFT separates the signal into windows, and calculates the Fourier transform in each [218]. The multiple frequency spectra are then lined up in sequence, and a 3D picture is built of the frequency variations over time. However, this approach is not without its drawbacks, mostly related to the variations in time and frequency resolution based on the choice of the window size. For example, it is hard to choose an optimal window size, as in many real situations the properties of oscillations and typical frequency differences between the neighbouring ones depend on their frequencies, while the windowed Fourier transform has the same resolution at all frequencies.

A.5.2. The continuous wavelet transform
Whilst it is impossible to have excellent time and frequency resolution simultaneously, the continuous wavelet transform (CWT) provides a more universal and applicable choice of time-frequency resolution structure than the WFT. It is especially useful in the analysis of data which contain low frequency oscillations, due to its logarithmic frequency scale [251,252]. This means that multiple low frequency oscillatory modes can be more easily distinguished in data. In the CWT, a mother wavelet is stretched and compressed while being translated along a signal in order to find information about the local frequency content of the signal.

A.5.3. The discrete wavelet transform
The discrete wavelet transform decomposes a time series into different scales. The time series is repeatedly convolved with a wavelet and a scaling function, which change in scale as the finer detail is separated out from the signal. Usually, the highest frequency detail (up to the Nyquist frequency) is extracted first, and then the time series is downsampled by a factor of two to extract the next level of coefficients. This is repeated until the length of the time series becomes a single data point [29].
This loss of data points at each level of wavelet coefficients is problematic for the creation of wavelet surrogates, where we wish to retain the time information for each scale. Therefore, the maximum overlap discrete wavelet transform (MODWT) is used instead, as it gives coefficients sets which are the same length as the time series [107].

A.6. Embedding
Embedding is the process of transforming a time series into phase space and was developed by Takens [253] and Mãńe [136]. A d-dimensional phase space representation of the time series is created by generating multiple delayed versions of the original time series [29]: . . .
where τ is the delay time and x(t i ) is the original signal. An example phase reconstruction on the x component of the Lorenz system, with σ = 10, ρ = 28 and β = 8/3 and time step 0.01 is shown in Fig. 36.
When embedding time series with phase space representations which are less distinctive than the Lorenz system, it can be difficult to choose the optimum values for the parameters τ and d. It is also possible that the system will be higher than 3 dimensional, and thus can no longer be visualized through plotting. Many attempts have been made to solve this problem. It should also be noted that time-delay embedding removes all time-dependent information. Further discussion can be found in [29]. Here we discuss only how the parameters were obtained for the examples presented in this paper.

A.6.1. Estimation of time delay τ
The time delay τ should maximize the spread of the data in phase space [29]. In order to find time delays for which the correlation between time points is minimal, and thus ensuring a greater spread of points, the autocorrelation was calculated as [29]: x(t n ) is the mean [254], and τ = l∆t. The first point up to a certain threshold where the autocorrelation reaches zero (or if it does not, where it reaches its minimum value) determines τ .

A.6.2. Estimation of embedding dimension d
The most widely used technique for the estimation of the embedding dimension is the false nearest neighbours method [118] and is the method used in this paper, both in the generation of pseudo-periodic and twin surrogates, and in the calculation of the correlation dimension.
The false nearest neighbours technique examines how close points from the two closest neighbouring embedding vectors remain in phase space as the dimension is increased. When the dimension is low, and not sufficient to represent all of the system dynamics, close points in phase space are more likely to be 'false' neighbours, and will reduce in number as the embedding dimension is increased. The first dimension for which the number of false nearest neighbours goes below a threshold (usually < 1% [29]) provides the estimated embedding dimension.