A direct method to detect deterministic and stochastic properties of data

A fundamental question of data analysis is how to distinguish noise corrupted deterministic chaotic dynamics from time-(un)correlated stochastic fluctuations when just short length data is available. Despite its importance, direct tests of chaos vs stochasticity in finite time series still lack of a definitive quantification. Here we present a novel approach based on recurrence analysis, a nonlinear approach to deal with data. The main idea is the identification of how recurrence microstates and permutation patterns are affected by time reversibility of data, and how its behavior can be used to distinguish stochastic and deterministic data. We demonstrate the efficiency of the method for a bunch of paradigmatic systems under strong noise influence, as well as for real-world data, covering electronic circuit, sound vocalization and human speeches, neuronal activity, heart beat data, and geomagnetic indexes. Our results support the conclusion that the method distinguishes well deterministic from stochastic fluctuations in simulated and empirical data even under strong noise corruption, finding applications involving various areas of science and technology. In particular, for deterministic signals, the quantification of chaotic behavior may be of fundamental importance because it is believed that chaotic properties of some systems play important functional roles, opening doors to a better understanding and/or control of the physical mechanisms behind the generation of the signals.


Introduction
A profusion of natural systems produce data classified as stochastic, partially (noisy) deterministic or deterministic [1][2][3]. Stochastic data can be classified as time or non-time-correlated: non-correlated data produces a flat Fourier spectrum and has equal power within any equal interval of frequencies for example Cauchy noise distribution, fractional Gaussian noise (fGn) and uniform white noise. Time correlated stochastic data has more general characteristics, displaying decreasing broadband power spectrum and distributing the signal power mainly within a finite interval of frequencies. Important classes of correlated stochastic data are those depicting a power law 1/f α spectrum decaying [4], sometimes referred as color noise [5], fractional Brownian motion (fBm), random walks [6]. Finally, there exists correlated noise generated by nonlinear models for which the power spectrum is not a simple function of frequency, including the possibility to display non-monotonic characteristics [7]. Considering deterministic data, important ones are those that depict chaotic dynamics. Such systems are deterministic, since the same initial conditions always evolve over time in the same way, but even infinitesimal differences among initial values grow exponentially, amplifying any small distinctiveness of them. In practice, the amplification of small differences of the initial conditions, associated to the fact that the dynamics must stay inside a finite region (the trajectories cannot escape off to infinity) lead the system to display very complex trajectories since divergences in one or more directions must be compensated by contractions (folding) in others, building a chaotic tangle of the trajectories resembling stochastic data sequences.
So in practice, for finite size datasets, chaotic systems behave similar to stochastic ones and the characterization of deterministic/stochastic properties of the data is then a fundamental question and an essential tool underlying many (maybe all) complex data analyses, comprehending biologic [8], economic [9], chemical [10], and physical [11] problems.
The literature shows that data analysis methods through which stochastic and chaotic phenomena may be distinguished is still under-development [12]. Such a scenario turns the characterization of deterministic and stochastic datasets of universal importance, finding applications (citing just few) from stochastic gene expression [13] to financial stock market indexes [14], from chemical reaction processes [15,16] to atom behavior [15,17,18].
As a first approach, deterministic chaos may be characterized by tools like Lyapunov exponent, shuffling data methods, and probability distribution function analysis [19,20]. However, the first technique is often not suitable for experimental data due to noise levels in measurement and the second and third ones present ambiguity when corrupted (noise) deterministic and/or time-correlated stochastic data are taken into account [7,21,22]. Simple examples of deterministic systems forced by stochastic driven or even time-correlated stochastic signals show that reliable quantification analyses must be yet provided [7,19,21]. Moreover, chaos is believed to play functional roles in many natural systems, exhibiting a signature to the mechanisms behind the data [23], turning the knowledge of its presence a challenging task. Many researchers conjecture that the flexibility of chaotic systems to adapt its behavior using just small amount of energy may allow efficient transitions between different activity patterns in natural systems, and may be used as optimum strategies by many of them [24][25][26][27][28]. Based on these hypotheses, a functional method to distinguish determinism from time-correlated stochastic datasets, reliable even for short and noisy datasets seems to be a key component of time series analysis.
Here an innovative approach based on recurrence analysis, namely the concepts of recurrence microstates (RM) [29,30] and the maximum RM entropy [31] as well as ordinal permutation patterns (PP) [32] are used to overcome limitations in the analysis of deterministic and stochastic data. The main point of this novel technique is based on the sensitiveness of both RM and PP to time reversals. RM are the generalization of the recurrence among two single elements of a time series to recurrence of segments of data. Considering a dataset, we compare the values of two sequences of N consecutive time-points, each element of the first sequence is compared to each other from the second sequence, regarded to be similar if a suitable recurrence vicinity parameter ε (the maximum difference among two values such that they can be considered similar) is defined. The result of these comparisons defines the cross-recurrence among the two sequences and results in the concept of RM [29,30]. The collective effect of the set of probabilities of all possible microstates brings more information than the simple linear recurrence among single time-points. Similarly, for the case of PP, comparisons of the magnitudes of consecutive values of a sequence of M elements, n x , n x+1 , . . . n M−1 , n M lead to the concept of ordinal patterns and again a set of probabilities of each ordinal pattern generates a collective effect that can be used in the quantification of the data.
For example in the case of N = 2, the RM are 16 four-digits binary numbers spanning from {0, 0, 0, 0} (none of elements of the first sequence considered similar to elements of the second sequence) to {1, 1, 1, 1} (all considered similar). In a similar way, using M = 4 the ordinal PP [32] will be four values elements, considering its relative amplitudes spanning from n x < n x+1 < n x+2 < n x+3 to n x > n x+1 > n x+2 > n x+3 resulting in 24 distinct possibilities. To get visual representations of the RM and the PP applied to a time series, we show in figure 1 a prototypical trajectory, n 1 , n 2 , . . . , n x , . . . , n K . Short randomly chosen forward and backward sequences of time-points are magnified. Based on these sequences, we compile RM represented by a (2 × 2) binary matrix expressing the cross-recurrences among two sequences of time points n x , n x+1 and n y , n y+1 , setting 1 to the matrix element if two time points are recurrent and 0 otherwise. Based on the same trajectory, an example of PP is represented by a specific set of relative amplitude sequences of the four time points.
The main point of this novel approach is based on the fact that both, RM and PP are sensitive to time reversals. This crucial property is used to show that, in general, for deterministic data the rule that leads the system from one point A to other B differs from the one that leads from B to A. Such concept finds its root in the observation that the tangent to the trajectory generated by a deterministic system is a function of the position in phase space and changes its signal when computed backward in time [33]. This observation leads to distinct probabilities of occurrences of (at least some) RM and PP. In contrast, for the case of stochastic data it is expected that such reversal of time results in no differences in the probabilities of RM and PP. We show that this approach enables us to build a unique tool which distinctly characterizes deterministic and stochastic properties of the signal. In case of stochastic signals, the time correlation scale of the signal may also be evaluated by computing an appropriate entropy [31,32]. The key point of the approach is how the probabilities of occurrence of each RM or permutation pattern are grouped and not really how they are built. Since groups of RM and PP suffer swapping among its occurrence probabilities when the data is backward analyzed, we can build a parameter to quantify how distinct these probabilities are. For stochastic datasets the distinction parameter must vanish, while for deterministic ones, it must be finite.
We find that this method can detect signals of determinism even when a large amount of noise is added to the signal. Noise level up to 80% of the maximum amplitude of the deterministic signal have been tested and even in these extreme cases, deterministic traces of the signal are detected. Such noise levels are far larger than any other method in the literature as can be verified in [8] and references therein. Finally, we emphasize that the same procedure used here may be associated to any other data segment analysis, provided that they are susceptible to time reversal (others than RM and PP) and may also be able to detect properties of deterministic and stochastic signals as discussed here.
In section 2, we analyze deeper the concepts of RM and permutation. Details of our approach as well as the definition of the main quantifier to distinguish stochastic from chaos are also described. Main results are presented in section 3, while discussions and conclusions are giving in section 4.

The recurrence microstates and ordinal permutation patterns
Let us consider a dataset of size K, given by a sequence of time-points n i (1) where we have boxed individual time-points as well as strings of two consecutive time-points to better visualization about how RM are built. The recurrence among two individual time-points n x and n y of the dataset is defined as the one-digit binary representation [34] where ε is the recurrence vicinity parameter, which measures how tolerant is the comparison of the two values, being a free parameter of recurrence analysis [34]. As we show later on in this text, ε can be automatically set by our method, turning the analyses free of parameters.
If we now compare sequences of consecutive time-points, we get the concept of RM [29,30], defined as the cross recurrence among two randomly selecting sequences of N consecutive time-points embedded in the dataset. For the case of N = 2, the RM are 16 four-digits binary numbers spanning from {0, 0, 0, 0} to {1, 1, 1, 1}. To get a visual representation of the RM, in figure 2 we give them all as (2 × 2) binary matrices expressing the cross-recurrences among the two sequences of time points n x , n x+1 and n y , n y+1 .
The same dataset given in equation (1) can be analyzed using M permutation of ordinal patterns [32]. For M = 4 we have strings of four consecutive time-points (3) which spans from n x < n x+1 < n x+2 < n x+3 to n x > n x+1 > n x+2 > n x+3 leading to 24 distinct patterns. A visual representation of all possible patterns is given in figure 2.
To extract information about stochastic or chaotic properties of a signal, figure 2 shows that RM may be organized in six groups, while PP may be grouped in just one: the first RM group is the one with no occurrence of recurrences, labeled as (a) in figure 2. The next group is composed of the RM with one recurrence, labeled as (b)-(e), followed by the two vertical/horizontal recurrence group (f)-(i), the two diagonal recurrence group (j)-(k), the three recurrence group (l)-(o) followed, finally, by the group of four recurrences (p). This classification of the RM is important to build the distinctness parameter defined in sub-section 2.4. The groups with zero and four recurrences do not bring information for the distinctness parameter, but some RM when backward analyzed turn to be other already existent RM, e.g. the RM (b) turns to be RM (e). This fact occurs in all green RM i in figure 2, and in all PP, e.g. the PP (y) turns to be the PP (a). On the other hand, for stochastic signals, the probability to get any RM or PP is the same, not mattering the analysis is done backward or forward in time. So the probability to get the RM (b) (for example) when the analysis is done forward must be exactly the same the one obtained using the RM (e) when it is analyzed backward. The same holds for pairs of PP. This affirmation is correct for a stochastic signal where no relation exists among consecutive values, but fails when deterministic signals are analyzed. As detailed in subsection 2.4, this is the main result of this article, making the distinction of stochastic and deterministic signals possible.
Once all RM and/or PP have been defined, we select a large number of samples (where a sample consists in a sequence of two consecutive elements or one sequence of four consecutive elements) embedded in the dataset and compute the probability of occurrence of each one of the RM P RM α or permutation pattern P PP α .

Recurrence microstates and permutation entropies
A suitable entropy (S) is defined as an information entropy and it is built from the diversity of the RM or PP, when a large enough number of samples of them is randomly selected in the dataset [29], where P (RM,PP) measures the probability of occurrence of each RM (recurrence microstates) or PP (permutations).

Eliminating the vicinity parameter ε
For the case of recurrence analysis, usually, ε is a free parameter as equation (2) suggests. The adjustable ε allows to evaluate of the local derivative at specific time-points, when we consider equally spaced time-points, which is not possible using permutation. From one hand it provides more information embedded in the probabilities of each microstates, but on the other hand, it also provides a free parameter dependence. Fortunately, this dependence is eliminated by observing that S = 0 when computed for sufficient large or small ε, due to the absence of diversity of RM for both cases. So, we impose a natural condition of a maximum for S [35,36] turning max(S) and RM into parameter free variables [30]. This choice to evaluate the appropriated vicinity parameter follows the maximum entropy principle established in [35], a traditional way to get appropriate distributions of parameter dependent values [37].

Quantifying datasets-the distinctiveness parameter γ
Now we show how the probabilities of occurrence of RM or PP and the value of S may be used to formulate a single index to quantify the dataset. Figure 2 depicts a schematic dataset of size K and all possible RM we can get from the data using sequences of two values. Based on the discussion of section 2.1 we collect the (RM) in four groups defining distinctiveness parameters (γ i ) for each group as and a total distinctiveness parameter (for microstate size of N = 2): where P RM α for α = b till α = o are the probability to find each RM α in the dataset as depicted in figure 2.
In the computation of the distinctness parameter we avoid the non-recurrent RM (a) and the completely recurrent (RM) (p) which bring no useful information for the value of γ RM . Similarly, the distinctiveness parameter may be defined based on the probabilities of occurrence of each ordinal patterns PP α for α = a till α = y in figure 2 and its images when the time-points are sampled backward as First, let us consider stochastic datasets. For infinite stochastic datasets, γ RM,PP must be zero, since each data point in a stochastic signal must be independent of all others. As discussed in section 2.1 the probability of some RM and PP must be the same due to the independence of the values in a stochastic signal. In practical situations, the signals will be finite and γ RM,PP fluctuates around a small mean value we call γ N . Here we define this mean value of the distinctiveness parameter as the one computed using an equal size (compared to the analyzed dataset) uniformly distributed stochastic signal or computed over the shuffled signal (surrogate data), choosing the largest value (to be more rigorous). Following this definition, we use the criterion: if γ RM,PP < γ N + 3σ, where σ is standard deviation obtained by using 100 calculation of the γ RM,PP and γ N is its mean value, we say the dataset represents a stochastic process. The so-called three-sigma rule of thumb expresses that ≈99.73% of all stochastic values in a normal distribution will be within three standard deviations of the mean [38]. Here with the help of the surrogate data, we define the same rule, not mattering the distribution or correlation (color noise) of the signal. Changing the threshold to γ N + 2σ still gives us a statistical confidence 95% that γ RM,PP < γ N + 2σ confirms the signal as stochastic, and γ RM,PP > γ N + 2σ characterizes a deterministic signal. We have used 3σ since, in general, deterministic signals lead to values of γ RM,PP much larger that σ turning the method extremely confident. For deterministic datasets there is no reasons to claim γ RM,PP should be null. In fact the rules behind deterministic datasets force a finite value for γ, because the rules privilege some RM or PP against others. At this point we use the fact that, in general, for deterministic datasets the rule to go from A to B is not the same to that one to go from B to A. Traditional recurrences among just two single values are not sensitive to time reversal but RM and PP are. To make this point clear, figure 2 shows a backward data series as well as all its possible RM and permutation pattern. Comparing them we observe that some are not sensitive to time reversals. Nevertheless many others have its probabilities swapped among them. So it is expected when a time reversal of deterministic datasets is considered, some probabilities of occurrence of RM or permutations just swap. The swap of probabilities is testimony of a deterministic dataset.

Results
Since the numerical values of γ RM,PP bring the same physical meaning, we simplify the notation writing γ RM, PP ≡ γ, and the use of γ based on RM or PP is explicitly given in the text.
A flowchart algorithm of our method to distinguish quantitatively stochastic and chaotic datasets is graphically depicted in figure 3. The main sequence of steps consists of just three evaluation: (1) based on the raw data we correctly sample it, computing the recurrence entropy as a function of the time step the data is captured. The correct sampling of the data is obtained when the first maximum of the recurrence entropy is reached. Such a procedure leads to similar results as other methods like mutual information [39]; (2) after correctly sampled, we compute the probability of occurrence of each RM P RM [29,30] or each ordinal patterns P PP [32]. (3) From the sets of P RM or P PP , we determine the distinctiveness parameter as defined by equations (9) and (10).
Following the works of Toker et al [8] and Hirata [40], we perform analyses over a wide range of experimental and model simulation datasets. All models or data used here are detailed in section 5. In all cases, when the deterministic or stochastic properties of the system are already characterized in the literature, our method corroborates these results. For the others which are still under study and a final answer is not yet provided in the literature, our approach brings new evidences about still unknown details of the datasets (details of the characteristics of each analyzed dataset are given in the methods). The datasets are composed of rather short time series of various types of systems and experimental situations, including diverse areas of science. An extensive noise level analysis is also performed to conclude up to what noise level the method is able to withstand. We emphasize that the approach proposed shows a high accuracy on all the applications and opens doors to a precise classification of them. Such a high level of precision may also bring promising perspectives about better model forecasting, since the generated data properties may be compared to original ones.
The success rates of the method for the detection of determinism and noise-corrupted determinism are reported in table 1; the success rates in the detection of determinism when mixed deterministic . Tree algorithm for the quantification of stochastic and deterministic datasets. In step 1 we compute the correct sampling rate for the datasets, evaluating the recurrence entropy as a function of the sampling rate of the datasets. The correct sampling occurs when the first maximum of the recurrence entropy function is reached. In step 2 we compute the RM probabilities P RM or ordinal patterns probabilities P PP . In step 3 we evaluate the distinctiveness parameter γ given by equation (9) or (10). γ vanishes for stochastic data but has a finite value for deterministic ones when long enough datasets are considered. However, all datasets are finite and therefore we must provide an upper limit for the value of γ to characterize stochastic datasets. To find a suitable limit, we have used the mean value of γ plus three times the standard deviation of 100 equal size datasets of uniformly distributed and non-time correlated stochastic data (γ N ). A second suitable approach is to compute γ N for 100 different shuffling of the same dataset (surrogate data). Finally, in case the dataset is characterized as stochastic (deterministic), we may compute an appropriated entropy S given by equation (4)  To group all our dynamical system in just on table, we have measured the added noise as a function of the maximum amplitude of the signals, ranging from 5% till 80% in table 1. Since all our signals are normalized the maximum amplitude of the added uniformly distributed noise may be easily computed. However, another common way to measure added noise is the signal-noise-ratio (SNR) [41]. If we suppose that all meaningful output from the analyzed dynamical systems depicted in table 1 are 'power' quantities, SNR measured in decibels (db) may be evaluated as Table 1. Success report in detecting determinism by the method based on 10 simulated deterministic systems (maps and fluxes) using 100 distinct datasets of 1000 time-points each. High success rates are reached for maximum noise amplitude up to 25% of the signal maximum amplitude. Traces of determinism are still captured for noise amplitude as high as 80% of the signal amplitude in many systems. These simulated systems include trajectories embedded into chaotic and hyper-chaotic attractors and one chaotic trajectory confined in an energy constant surface (for the conservative standard map).
Results for the quantification of deterministic signals using RM (lines 1-10) and ordinal PP (lines [11][12][13][14][15][16][17][18][19][20] Addictive noise corruption (% of the maximum amplitude of the deterministic signal)  systems depicted in table 1 may be exemplified as following: in a wireless network, it is common a requirement of a SNR db be at least 20 dB or higher to the network to operate adequately. Considering this criterion, we would conclude that only the case of 5% of noise would contain useful information. As may be observed in table 1, the method goes much further, distinguishing deterministic properties with much higher level of noise corruption. All data sets are analyzed by using only a short number of data-points. We have set the dataset sizes to 1000 for simulated signals, and all initial conditions were randomized, while for the experimental data we have used all available dataset or in case it was larger that 5000 timepoints, we have segmented the data into ≈5000 datapoints. For all cases (discrete or continuous dynamical systems and experimental data) an adequate sampling is used to correctly capture the dynamical features. To evaluate the adequate sampling, we have used the condition of the first maximum of the recurrence entropy function obtained when we compute the recurrence entropy as a function of the time-delay (see methods) [35,37,39].
We take advantage of the results recently published [8] making comparisons about the efficiency to successfully quantify determinism or stochasticity of the method only with the results already published by them. Such a procedure is based on the fact that Toker et al [8] already claimed that for each analyzed system they have compared the performance of different available tools (surrogate-based tests for stochasticity, de-noising methods, downsampling methods, and chaos-detection methods), and have chosen to report just the tools with the highest classification performance.
From table 1 we observe that, in general, our results fully agree with already reported ones [8], but it brings additional advantages when we pay attention to the reported noise tolerance the and size of the datasets of previous methods. The approach proposed here is far more tolerant to noise contamination, namely, our results are almost 100% confident when a maximum noise amplitudes of 5% are added into the deterministic time series. For all normalized time series we have used, 5% of noise amplitude means approximately 50% of the standard deviation of the signal without any noise added, i.e. a far more tolerant method and almost 100% reliable compared to previous works [8]. In general a high accuracy is obtained for noise amplitudes up to 25% of the maximum signal amplitude. Traces of determinism for some systems are still captured for a maximum noise amplitude as high as 80% of the maximum signal amplitude. Such a high tolerance to added noise in the time series is a crucial property of our method. We say a dataset is deterministic when its backward analysis does not preserve the probably of occurrence of some RM or ordinal patterns. For all deterministic signals we have used, such asymmetry in the probability values leads to high values of the distinctiveness parameter (γ) given by equations (9) and (10). For the case of the logistic map (for example) this value is at least six times larger than the threshold to conclude that the signal is stochastic (other systems behave in a similar way). So even when large noise values are considered the γ values are still larger than the γ N value plus three times the standard deviation of an uniformly distributed stochastic signal or via a second approach, namely computed based on shuffled data (surrogate data). In this way, traces of determinism are still detected even in such extreme cases. To explore this point further, giving an insight about the dispersion of the distinctiveness parameter (γ), figure 4 displays an example of the γ behavior for a set of 100 different initial conditions (i) of the logistic map for r = 4.0 and six distinct maximum noise amplitudes (χ). γ values to consider the dataset as stochastic map the pink shaded area and satisfies γ < γ N + 3σ, where σ is the standard deviation of 100 equal size datasets, and γ N is computed in two ways: stochastic uniform white noise; and surrogate logistic data. The use of the 3σ for the threshold [38] lead us to a statistical confidence that 99.73% of the γ values map the pink shaded area, when computed for stochastic datasets, while they map the area below the dot-dashed line when computed for surrogate data. As may be observed, both maximum values are indistinguishable. We observe that the larger the noise amplitude the smaller the values of γ but, as a rule, it stays higher than the threshold of γ values used to characterize stochastic signals for all considered noise corrupted levels. As expected, for the case of 100% noise added, the values of γ will denounce the stochastic nature of the signal and it will be embedded into the light pink area.
Such a high tolerance to noise makes the approach very useful to analyze empirical data. The fact that we recognize deterministic traits embedded into noisy data may lead to new models and new possibilities of forecasting. The high sensibility of the approach may also help in the analysis of nonstationary datasets, in which the level of noise contamination is not constant.
The two last columns on the right of table 1 show the results of noise contamination of 66% and 80% of the maximum amplitude of the deterministic signal. In these extreme cases we see that the results deteriorate using rather short time series of just 1000 time-points. Similarly to what have been done using 2000 timepoints and 25% and 33% of added noise, we point out that noise contamination cases of 66% and 80% can be substantially improved using larger datasets.
From table 2 we see that mixtures of two deterministic datasets are also classified as deterministic for all analyzed cases. The most difficult cases are those where the amplitudes of both added datasets are similar. In these cases the resulting dataset may be quantified as stochastic. Since, in general, the sum of two deterministic signal is not a solution of any other deterministic system, such a classification may not be wrong, but reflecting a real possibility of the sum of deterministic signals. In case where the amplitudes are different the addition of deterministic signals resembles the cases of noise contaminated deterministic datasets. Table 3 handles with the detection of stochastic datasets. As observed, our method based on RM or ordinal patterns has virtually 100% of success even when we are dealing with small 1000 data-points. For time correlated stochastic signals the values of the recurrence entropy can be used to classify the correlation time, since there exists a unique relation about the recurrence entropy magnitude and the values of the α exponent in the spectrum decay law [31], P ∝ 1/f α .
Finally table 4 expresses our results for experimental situations. We have characterized Chua'circuit data as deterministic as expected, since the Chua'circuit is a paradigmatic way to experimentally generate chaotic data with low levels of noise. Our results also show that the sound vocalization of the vowel 'a' is characterized as deterministic. The neuronal activity of a single neuron of the squid is also clearly deterministic. On the other hand, the analysis of the heart beat RR intervals and the signal collected from the variable white dwarf star the data are characterized as stochastic. In these cases, the stochastic nature of the signal may be the result of the way the signals are captured. In both situations the sensors that receive the signals, in fact, may be capturing signals generated by many individual sources. In these cases the summed signal may display some stochastic characteristics. Our approach suggests that, when it is possible, a careful analysis of isolating the signal detection may be a useful procedure to collect these kind of data. An important result came from the long data records of the geomagnetic Dst index and the speech data. The deterministic or stochastic properties of the Dst have been already studied in the literature and temporal changes in the signal characteristics are pointed out as one of its main characteristic, leading the dynamics to be considered complex. Corroborating this idea, we show that the index depicts a mixing of deterministic and stochastic properties agreeing with other results in the literature [42]. Finally the long recorded human speech shows similar temporal changes as the Dst index in its characteristics. We have characterized the human speech as a non-stationary signal, agreeing with long term results [43].
In naturally noisy data acquisition due to limited equipment resolution for example, or due to the methodology, a high level of noise may be added to the natural data and our very restrictive threshold of γ N + 3σ values may characterize deterministic signals as stochastic. In these cases maybe a more flexible threshold of γ N + 2σ leading to a sill high statistical confidence of 95% [38] may give more flexibility to quantify traces of determinism, at least in some segments of long signals, as is the case of the speech data. The flexibilization of our threshold to 95% increases from 11 to 21 the number of segments of the human speech data characterized as deterministic when RM are used. These results point out to a very nonstationary behavior for speech signals. The permutation of ordinal patterns has not captured this behavior and has indicated all speech data as stochastic.
A second example of implicit noise acquisition is the Dst index, characterized as an integer number and frequently repeating the same value hour after hour. Depending on the magnitude of the index, such an integer quantification imposes a large artificial noise addition into the data. Despite its discrete nature, the optimum time delay computed to the Dst data following our protocol, points out that the larger the delay the more deterministic traces of determinism are captured as observed in the 13 by 13 and 23 by 23 hours data. In fact, using 23 hours of delay the Dst index is predominantly characterized as a deterministic signal even when we use 99.7% interval of statistical confidence. Similar analyses for the Dst, but considering 95% statistical confident result, increase the number of segments considered as deterministic from 05 to 08 when 13 hours time delay is assumed, and using 23 hours, all 08 segments shall be considered as deterministic.
Such long time correlation of the index is already conjectured in the literature [44] and associated to time-correlated stochastic signal. When analyzed hour by hours, our results agree with the long time-correlated behavior of the signal, but it brings new information about a possible deterministic behavior when an appropriated time-delay is applied to the analysis, similarly to what happen with the nonlinear stochastic signal already reported in the literature [7].
Finally, because the Dst index is composed of integer numbers, and frequently repeating the same value hour after hour, we have judge that the permutation of ordinal patterns may be not suitable to the analysis, in this case table 4 does not display results for this index.

Discussions and conclusions
Here we have introduced a new method to distinguish deterministic from stochastic time series. Our results have shown that traces of determinism may be inferred even when deterministic signals are strongly corrupted by noise. Noise/signal maximum amplitude ratios up to 0.8 have been tested and even in these extreme cases deterministic characteristics of the signal are detected.
The method introduced here uses the fact that forward and backward deterministic time series generate distinct probabilities of data sequence patterns due to time asymmetry of deterministic datasets [45,46]. The key point of the method is the observation that, in general, deterministic systems have a rule to take one point A to other B but a different one to take B to A. To extract properties from these small data strings, we have used RM [29] and ordinal patterns [32]. An important characteristic of the method is the potential to get reliable results using short time datasets. As a standard size of the datasets, we have used 1000 time points for simulated results and no more than 5000 for experimental data, but useful results may be obtained using even shorter time series, mainly when just a small amount of noise is considered. To explore shorter datasets further, figure 5 depicts results for the same deterministic time series of table 1 but now using only 300 time-points (for the βx map, we have used only β = 2.99). Reliable results are still observed even for such an extremely small number of time-points even for a statistical confidence of the results of 99.7%. For datasets smaller than 300 points our results are still useful but a downgrade of the statistical confidence of the results to 95% (γ < γ N + 2σ associated to stochastic signals) or even 68%, (γ < γ N + σ) may be more suitable. In the supplementary material results including 150 time-points are presented. As a rule, the effect of noise degrade the success of the method. The use of RM preserve good results even for such extreme short time series, mainly for low noise levels and selected systems. The results show that the use of permutations is more sensitive to the combination short time-series and noise addition, and should be avoided for these cases.
Another important point is observed in table 1, where the results for ordinal patterns seem to be lightly more tolerant to noise. An answer for that may be related to the fact that we have used in all our results for RM analyses, RM composed of all possible recurrence patterns among two sequences of two consecutive time-points, while for the ordinal patterns we have used four consecutive time-points. The use of longer sequences of time points leads to more robust results to noise, but may be less effective if we consider systems with a large maximum Lyapunov exponent, as can be observed for the β = 2.99 map, where the high value of the Lyapunov exponent λ = ln β makes the successive iterations of the map more similar to stochastic sequences with no dependence/correlation with the initial condition, and as a consequence degrading the efficiency of the method. On the other hand, for moderate levels of chaoticity the use of N = 3 RM size may bring better results, but, in general, the use of N = 2 is enough to achieve good results for a considerably high amount of noise.
We have used two approaches to characterize chaotic and stochastic signals, both based on the use of recurrences segments of a time series, RM and permutations. RM and PP are similar in many aspects but the results they provide are not exactly the same as we have pointed out above. Depending on the characteristics of the data, one may be more suitable than the other. The use of RM associated to a giving vicinity parameter (an absent parameter in permutation) is sensitive to the local divergence or convergence rates. The simplest example is the sequence of N number in a straight line with slope 'a'. Contrarily to RM, PP is not sensitive to the value of 'a'. When the divergence is strong, as is the case of the x n+1 = βx n map for high values of β, the strong divergence associated to the added noise make the forth (or even the third) value of a four-value-sequence of the βx-map looks like a random number when permutation is used. Since RM suits the vicinity parameter accordingly to the slope, and (in this case) we are using just two values in sequences, it is less sensitive to the value of β. Another important point is the fact that RM analyzes two sequences of two or more data points which are not necessarily near in the dataset. These far apart sequences are representative of distinct segments of the trajectory. This last characteristic may be important when deterministic signals are analyzed due to non-homogeneity of attractors or even constant-energy surfaces.
Finally, we would like also to mention the high sensitivity of the distinctiveness parameter γ. For all our deterministic systems, γ saturates in values much larger that the threshold considered for stochastic datasets. It brings a very high sensitivity to our method, allowing noise corruptions as high as the own amplitude of the signals. This characteristic of our method to detect even traces of determinism in real signal may configure an useful tool in analyses of signals where the collected data may be severely contaminated by noise.

Data
We simulated all sets of differential equations using the 5th order adaptive time-stepping method based on the Runge-Kutta method with appropriated coefficient choices defined by Tsitouras [47] and implementated in Julia programming language [48,49] available in the packet DifferentialEquations.jl [50]. A sufficient large number of integration steps are discarded (a transient trajectory) before trajectories of 1000 time-points are recorded to be analyzed. Such procedure is necessary to avoid transient trajectories making sure the analyzed trajectories are already set down on the attractor. Map trajectories are also computed after a sufficient large transient time being discarded. In practical situation it is not clear when a unique variable of a system is accessible or just a combination of them, so for multidimensional dynamical systems, the analyses are done using a combination of all dynamical variable, we have computed the Euclidian norm but any other combination of variables may be also appropriated when a suitable sampling is computed [8]. We have tested sum of two or more variable of the system and all our results are consistently the same.
The map may be viewed as a map on a circle, due to the modulo 1 that makes x an angle-like variable, (x increasing from 0 to 1 corresponds to one circuit around the circle). The βx term uniformly stretches the variable x (its circumference increases β times its original length) while the modulo 1 performs folding of the map into its original length. Both operations make the exponential divergence of nearer points in the domain, generating a chaotic trajectory.

Chaotic logistic map
The logistic is a nonlinear map, for long used to show how complex, chaotic behaviour can arise from very simple non-linear dynamical equations. Since its first appearance as a discrete-time model of population growth [51] it has been used as a paradigmatic model for chaos studies. Mathematically, the logistic map is written as [20] x n+1 = rx n (1 − x n ) (13) and, for r = 4 and (0 < x < 1) it depicts chaotic trajectories.

Towel map
The folded-towel invertible map is a classical example of hyperchaotic map (it has two Lyapunov exponent bigger than zero) described by the following set of equations [52] x n+1 = 3.8x n (1 − x n ) − 0.05(y n + 0.35  (28)). It is a broadly studied bi-dimensional discrete dynamical system written as [20,53] x n+1 = 1 − ax 2 n + y n , The map dynamics depends on two parameters, which for chaotic behavior is set to a = 1.4 and b = 0.3. The map has a large class of dynamical behavior and is a minimal normal form for modeling flows near saddle-node bifurcations, being also a prototype of the stretching and folding mechanism that leads to chaotic dynamics.

Standard map
The standard or Chirikov-Taylor map is a two-dimensional area-preserving chaotic map defined on a torus [54]. It is built based on for two canonical dynamical variables, the momentum (y, −π < y π) and the coordinate (x, −π < x π) defined by We have used k = 0.972 for which the standard map displays chaos (for an initial condition of (x, y) = (0.001 245, 0.008 75)).

Ikeda map
The Ikeda chaotic map is a simplified model of light propagating in ring cavity containing a nonlinear dielectric medium [55,56] described by y n+1 = u(x n sin t n + y n cos t n ), where u = 0.9 is a parameter and For u 0.6 the Ikeda map has a chaotic attractor [52].

Spiking neuron
This set of equations describes a simple model of cortical neurons [57] that display both spiking and bursting behaviors. The model displays some characteristics of the Hodgkin-Huxley-type dynamics and, at the same time, is computational as efficient as an integrate-and-fire system. The main variable of the model is neuron's membrane potential v, a membrane recovery variable u, and an input current I. a, b, c, and d are parameters.
The model needs a reset of the neuron variables to the quiescent state following if x +30 mV, then x ← c and y ← (y + d).

Lorenz system
The chaotic Lorenz system is areduced atmospheric convection model, described by the following system of equations [20] where ρ and σ are the Rayleigh and the Prandtl numbers, and the quantity b is a constant determined by the geometry of the problem. Here we fix ρ = 28, σ = 10 and b = 8/3, a standard set of parameter for chaotic behavior of the Lorenz system [20].

Rössler system
The Rössler system is another paradigmatic oscillator that displays chaos. The differential equations describing the model is [20,58] and we set a = 0.1, b = 0.1, c = 18.0, for which the Rössler system exhibits chaos [58].

Chua system
The dynamical behavior depicted by the Chua's electric oscillator [59] was one of the first experimentally signal where chaos was observed. Using Kirchhoff laws the nonlinear dynamics of the circuit can also be numerically simulated by the set of equations where a = 15.60, b = 25.58, φ(x c ) = x c + g(x c ) and g(x c ) is the Chua's diode, a nonlinear active resistor described by the piecewise-linear equations [59] g(x c ) = where m 0 = −8/7, m 1 = −5/7 [52].

Colored noise
In nature, many stochastic signals have a power law spectrum distribution, (P ∝ f −α ). The color of noise refers to the values of α. Violet (f 2 ), blue (f 1 ), white (f 0 ), pink (f −1 ), and red (f −2 ) are examples of color for a stochastic signal. In general the smaller the α the larger the time correlation of the signal. Coloured noise was generated using the open Python library colorednoise.py [4,60].

Random walk
A random walk is modeled by an noisy auto-regressive process where δ follows a normal random distribution with mean value 0 and standard deviation value equal to 1.
On the other hand a trended random walk is obtained using where b is the value of the trend, which is randomly selected for each simulation following a Gaussian distribution with mean value 0 and standard deviation value of 0.01 [8].

Fractional Brownian motion and fractional Gaussian noise
fBm and fGn are stochastic processes. Both time series depend on the Hurst index H [6]. For the fBm H = 0.5 corresponds to the classical Brownian motion. If H < 0.5 (H > 0.5) the time-series is positively (negatively) correlated. For fGn H = 0.5 characterizes a white noise [6]; if H > 0.5 (H < 0.5) the fGn time series exhibits long-memory (short-memory). The time series are generated with the Python library fmb.py [61].

Uniform Gaussian noise
Gaussian noise distribution generated using the Julia programming language package Distributions.jl [49].

Uniform noise
Uniform noise distribution generated using the Julia programming language package Distributions.jl [49].

Nonlinear colored noise
A nonlinear correlated noise can be obtained by a nonlinear moving average filter with a random driver as presented by Freitas et al [7] x n = ay n + by n−1 (1 − y n ), where a = 3, b = 4 and y i a uniform pseudo-randomly generated quantity between 0 and 1.

Dataset I-white dwarf pulsating star
White dwarfs are the life end points of the majority of stars in the universe in which high gravity and temperature make them laboratories of physics under extreme conditions. Light emission pulsations are seen along white dwarf emissions and are believed to be an evolutionary effect in otherwise normal white dwarfs [62,63]. These pulsations may be used to probe stellar interiors (asteroseismology). We have analyzed 17 time-series (segments of size 500) of the light intensity of the variable white dwarf star (PG1599-035) [64]. The fluctuations in the intensity of the star have been observed to result in a noisy signal [64]. The data is open and free available in -https://comp-engine.org/#!browse/category/ real/astrophysics/light-curve.

Dataset II-RR-interval from heart beats
PhysioNet provides heart beat (RR-interval) time series in health and disease situations to support and improve conclusions on the statistical properties of physiologic time series [65,66]. The dataset is composed of 15 RR-interval time-series of humans grouped in five health, congestive heart failure, and atrial fibrillation respectively. Each time series is about 24 hours long (roughly 100 000 intervals). We have segmented the data into 1-hour intervals, leading to 120 datasets counting 3000-5000 RR-intervals for health, fibrillation and congestive heart failure individuals. All of the time series were derived from continuous ambulatory (Holter) electrocardiograms [67]. The data is open and free available inhttps://physionet.org/content/chaos-heart-rate/1.0.0/.

Dataset III-sound vocalization of vowel 'a'
We have selected 10 samples of the PhysioNet provided data of digitized vocalization of 5 pathological, a and 5 healthy voices [65]. Each data consists of a recording of a vocalization of the vowel 'a' five seconds in length. All samples were recorded in a quiet (<30 dB of background noise) and sampled at 8000 Hz and their resolution was 32-bit [68]. The data is open and free available in -https://physionet.org/content/voiced/1.0.0/.

Dataset IV-individual squid neuron recording
We have selected ten samples of the PhysioNet provided data of the SGAMP database containing single-unit neuronal recordings of North Atlantic squid (Loligo pealei) giant axons in response to stochastic stimulus currents. Each record lasts several seconds and is sampled at 125 kHz [65,69]. We have used segments of size 1, 000 and an appropriated time delays (spanning from 290 to 450 timepoints). The first 200 000 points of the datasets have been discharged. The data is open and free available in https://physionet.org/content/sgamp/1.0.0/.

Dataset V-Chua'electric circuit
We have implemented an inductorless Chua'circuit [59,70] and collected data using a Tectronix TBS 1052B oscilloscope and a temporal resolution of 25 kHz. We have also used a quad-operational amplifier integrated circuit LM324 to emulate the inductor of the circuit [70]. We have used segments of size 1000 and an appropriated time delays (spanning from 290 to 450 timepoints).
Complete specification of the experimental setup are included in the supplementary material. The data is open and free available under request.

Dataset VI-nonstationary geomagnetic index Dst
The Dst geomagnetic index monitors the variations of the globally symmetrical ring current of Earth, the van Allen or radiation belt of the magnetosphere, it is a measure of the average disturbance of the geomagnetic field in the Earth's low-latitude region [71]. It is used to monitor magnetic storms. We analyze the hourly Dst indices from 01 January 2000 to 31 December 2020. The data is segmented into 1000 timepoints, leading to 1000, 13 000 and 23 000 hours of time intervals in each data. The database is available from the World Data Center for Geomagnetism of Kyoto University, Japan. The data is available (under rules) in [72]-http://wdc.kugi.kyoto-u.ac.jp/dstdir/.

Dataset VII-nonstationary long speech
We have digitized a long speech (≈12 minutes, captured from the BBC ® Brasil Youtube ® channel and sampled at 8000 Hz and available-https://youtube.com/watch?v=8gM6WJ-ED9A. The data is segmented into 7.3 seconds samples that considering the appropriate time delay leads to datasets of size 4500.