Emergence of transient chaos and intermittency in machine learning

An emerging paradigm for predicting the state evolution of chaotic systems is machine learning with reservoir computing, the core of which is a dynamical network of artificial neurons. Through training with measured time series, a reservoir machine can be harnessed to replicate the evolution of the target chaotic system for some amount of time, typically about half dozen Lyapunov times. Recently, we developed a reservoir computing framework with an additional parameter channel for predicting system collapse and chaotic transients associated with crisis. It was found that the crisis point after which transient chaos emerges can be accurately predicted. The idea of adding a parameter channel to reservoir computing has also been used by others to predict bifurcation points and distinct asymptotic behaviors. In this paper, we address three issues associated with machine-generated transient chaos. First, we report the results from a detailed study of the statistical behaviors of transient chaos generated by our parameter-aware reservoir computing machine. When multiple time series from a small number of distinct values of the bifurcation parameter, all in the regime of attracting chaos, are deployed to train the reservoir machine, it can generate the correct dynamical behavior in the regime of transient chaos of the target system in the sense that the basic statistical features of the machine generated transient chaos agree with those of the real system. Second, we demonstrate that our machine learning framework can reproduce intermittency of the target system. Third, we consider a system for which the known methods of sparse optimization fail to predict crisis and demonstrate that our reservoir computing scheme can solve this problem. These findings have potential applications in anticipating system collapse as induced by, e.g., a parameter drift that places the system in a transient regime.


Introduction
Recent years have witnessed a growing interest in exploiting machine learning for model-free prediction of the state evolution of chaotic dynamical systems [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18], with a focus on reservoir computing (a type of recurrent neural networks (RNNs)) [19][20][21][22]. The core of a reservoir computing machine is a nonlinear dynamical network of artificial neurons, typically of complex topology. With proper training based on time series data from the target chaotic system of interest, the network becomes a self-evolving dynamical system that supposedly represents a replica of the target system to some reasonable accuracy. From the same initial condition, temporal synchronization can be achieved between the reservoir machine and the target system [16], enabling prediction for a finite duration of time. In most existing studies, the attention has been to the parameter regime of the target system, where there is a chaotic attractor, so training is done using time series data from the attractor with the goal to predict the state evolution as determined by this underlying attractor. Because of these fea-tures, the corresponding reservoir computing machine itself is trained into a dynamical system that generates a chaotic attractor.
In this paper, we build on our recent work [23] to address the issue of machine-generated transient chaos. In principle, if the machine is trained with an ensemble of transient chaotic time series, it should be able to generate transient chaos insofar as the amount of training data is sufficient. However, in nonlinear dynamical systems, the occurrence of transient chaos implies the inevitable collapse of the system to an undesired state. We thus assume that transient chaotic time series are not available and the available training data are collected while the system is in a parameter regime of chaotic attractors. Our recent work [23] has demonstrated that it is possible to train the neural machine with attracting chaotic time series to predict the critical transition from sustained to transient chaos. There is also initial evidence that a properly trained reservoir computing machine is able to generate transient chaos. The present paper has three main points that go beyond our recent work [23]. First, we carry out a more systematic analysis of the statistical behaviors of the machinegenerated transient chaos, using different model chaotic systems than those in reference [23], thereby widening the scope of the finding that a reservoir computing machine can be trained to faithfully generate transient chaos that agrees statistically with the ground truth. Second, we demonstrate that a properly trained machine can generate intermittency. (To the best of our knowledge, there were no previous works on predicting intermittency with reservoir computing.) Third, we focus on a paradigmatic chaotic system for which the existing sparse optimization methods fail to predict crisis and transient chaos, and demonstrate that our parametercognizant machine learning approach can solve this problem. This result was not reported in reference [23]. It should be emphasized that, for the neural machine to generate statistically meaningful transient chaos, training should be done with attracting chaotic time series because normal functioning of the system is often associated with a chaotic attractor, while transient chaos leads to system collapse. In such a situation, it is feasible to collect the time series only when the system still functions normally, for if the system is in a transient chaotic regime, it will collapse after a finite duration of time, rendering infeasible to obtain sufficient amount of training data.
The main idea behind predicting crisis [23] and training a reservoir computing machine to generate the correct transient chaotic dynamics is the following. Let p be the bifurcation parameter of the target chaotic system, with p c being the critical point, where the system exhibits a chaotic attractor for p < p c and transient chaos for p > p c . We train the reservoir machine with time series collected from a small number of parameter values in the attractor regime. For each parameter value, we stipulate that the machine is well trained in the sense that it is capable of predicting correctly and accurately the chaotic evolution at the same parameter value for a reasonable amount of time. As training is done at multiple parameter values, it is imperative that the machine is made to be cognizant of the parameter value, which can be accomplished by designating a special input channel for the values of the bifurcation parameter [23]. It has been demonstrated [23] that, insofar as the machine is well trained for a small number of parameter values in the attractor regime, it can predict the crisis transition point. Here, we shall demonstrate that a parameter change that pushes the value of p beyond p c will make the machine to generate the 'correct' transient chaos in the sense that, statistically, the transient chaotic behaviors generated by the neural machine agree with those of the target system.
These results have practical applications in providing early warnings for a possible system collapse. For example, if the target chaotic system is in the attractor regime close to the critical point and is regarded as functioning normally, a direct examination of the time series would give absolutely no indication that the system could collapse upon a small parameter drift. A well trained reservoir computing machine-a faithful replica of the original system, can predict possible drift of the system into the transient chaos regime and the subsequent collapse [23].
It is important to place the main idea behind our recent work [23] and the present work in a proper perspective with respect to previous works. The idea of a parameter-aware RNN was proposed in an early work [24], where the authors trained an RNN using time series from different systems for different parameter values, and demonstrated that with a "fixed weight neural network," changing input alone can make the RNN produce various dynamical behaviors of the target systems with one-time-step predictions. More recently, reservoir computing with a parameter channel has been studied independently in references [25][26][27]. In reference [25], the approach was used to predict the occurrence of periodic windows and other regime transitions in nonstationary chaotic systems with or without dynamical noise. In reference [26], the approach was applied to the Lorenz system to predict Hopf, saddle-node, and pitchfork bifurcations, where training was carried out based on the normal forms of the bifurcations. In reference [27], a relevant yet different approach of dynamical learning with reservoir computing was articulated, where an error feedback loop and a context feedback loop were added to the standard reservoir structure. It was shown that, after training, the modified reservoir system has the ability to learn dynamics that were different from those of the training set with a small amount of data. The process was named as 'dynamical learning', which may be understood as an automatic adaptation of the fixed weight neural network with the error and context feedback loops. In the supplemental material The neural network consists of three layers: the input layer, the hidden layer and the output layer. The three vectors u(t), r(t), and v(t) denote the input signals, the dynamical states of the hidden layer, and the output signals, respectively. The value of the bifurcation parameter of the target system is fed into the hidden layer neural network through the parameter channel. During the training, the reservoir system is open as it takes in external time series at a small number of parameter values in the regime in which the target system has a chaotic attractor. After training, the output variables v(t) is connected to the input, closing the system and making it a self-evolving dynamical system that can generate transient chaos for input parameter values beyond the attractor regime.
of reference [27], it was demonstrated that the framework is capable of predicting the Hopf bifurcation in the Lorenz system. Regarding transient chaos, in reference [26], it was demonstrated that the reservoir's predicted trajectory of the Lorenz attractor behaves chaotically for a while, and then begins to fall into a fixed point. Recent works [23,26,27] have thus demonstrated the ability of reservoir computing machines to learn and predict transient chaos. Our present work focuses on the following three aspects: (1) statistical properties of machine-generated transient chaos, (2) intermittency, and (3) transient chaos in nonlinear dynamical systems for which the previous sparse-optimization based prediction methods failed.

Parameter cognizant reservoir computing machine for generating transient chaos
Here we briefly describe our recent parameter-cognizant reservoir computing framework for predicting crisis and transient chaos [23]. The codes of this work are available from GitHub [28].
A reservoir computing machine is a RNN, where the elements of the input-layer matrix and of the connection matrix of the dynamical network in the hidden layer are randomly chosen and held fixed. The only entity to be determined through training is the elements of the output-layer matrix. Figure 1 shows that, the input time-series data constitute a D in -dimensional input vector u(t), and the input matrix W in of dimension D r × D in projects u(t) into a high-dimensional state vector in the the hidden layer. The bifurcation parameter p of the target system is fed into the hidden layer through the parameter channel defined by matrix W p of dimension D r × D p . Matrix W r of dimension D r × D r is the connection matrix of the hidden layer, which typically has a random topology. The dynamical evolution of the nodal state in the hidden layer is governed by a nonlinear activation function, such as the hyperbolic tangent function. The nodal states of the network in the hidden layer at time step t is represented by the D r -dimensional vector r(t). The output layer matrix W out of dimension D out × D r is a readout matrix from the hidden state vector r(t) to the output vector v(t) of dimension D out = D in .
Prior to training, the weights (matrix elements) in W in , W p and W r are generated randomly. The matrices are then fixed afterward. Specifically, the weights of W in are generated from a uniform distribution in the interval [−k in , k in ] and the weights of W p are drawn from another uniform distribution in the interval [−k p , k p ]. Both W in and W p are dense matrices so that each node in the input layer is connected to all the nodes in the hidden layer. The matrix W r defines a random network of size D r and average degree d, which is undirected and weighted with the weights drawn from a standard normal distribution and rescaled such that the spectral radius of the network is λ (a hyperparameter). Here, the average degree d of a network is the average number of links that a node has. The spectral radius λ of a network is the largest absolute value of the eigenvalues of its adjacency (connection) matrix.
During training, the time series and the associated value of the bifurcation parameter are fed into the machine in a step-by-step manner. The dynamical evolution of the reservoir hidden state r(t) is governed by the following rule: where Δt is the time step, tanh(q) ≡ [tanh(q 1 ), tanh(q 2 ), . . .] T for q = [q 1 , q 2 , . . .] T , α is the leakage factor, p 0 is the bias of p. A reservoir computing machine is thus a discrete-time dynamical system. To ensure that it can accurately represent a target dynamical system, we use time steps that are two orders of magnitude smaller than the typical time scale of the target system. The initial condition for the hidden state can be convenient set to be r(t = 0) = 0. The process is repeated for each value of the training bifurcation parameter. (The effect of the relative locations of training points will be discussed in section 4.1.) The state vectors r(t) at all the time steps are recorded, which allows the output matrix W out to be calculated through a standard regression process between the true data vector u(t) and the hidden state vector r(t). Because of the need of training at a number of distinct values of the bifurcation parameter, there are multiple pairs of u(t) and r(t). We stack these pairs together in the temporal dimension to form a pair of vectors r all (t) and u all (t) that extend a longer temporal domain. To remove the undesired transient behaviors of the reservoir dynamical network, the first 10 time steps for each pairs of u(t) and r(t) are disregarded before they are stacked. The regression method in reference [4] is used to deal with the issue of symmetries in the reservoir system, where r all (t) is replaced by r all (t) with r all (t) i = r all (t) 2 i for even rows (corresponding to nodes in the hidden layer with even indices) and the other elements in odd rows being the same as in r all (t). A standard linear regression between u all (t) and r all (t) can then be carried out through minimizing the loss function where β > 0 is the l 2 -regularization coefficient. The regularized regression can be achieved through where I is an identity matrix of dimension D r , U and R are the matrix forms of u all (t) and r all (t), respectively, with different columns representing different time steps and different rows corresponding to different dimensions. Validation is achieved by letting the trained reservoir machine make short-time predictions of the target system for each training value of the bifurcation parameter. In particular, the prediction is obtained through Comparing between the predicted and real time series leads to the validation error. Since the real time series are not available during prediction, the input vector u(t) in equation (1) is replaced by the output vector v(t) from the last time step. In the validation and prediction phases, equation (1) becomes which effectively defines the reservoir-computing machine as a self-evolving dynamical system under external parameter input p. As the time series are validated immediately after the training phase, the initial condition of the reservoir hidden state can be set as the state from the last time step of training. The typical validation length is about 4-6 Lyapunov times of the target system. (For a chaotic system, the Lyapunov time is a characteristic time scale, which is defined as the inverse of the largest Lyapunov exponent.) The prediction error is the average root mean square error. Taken together, the so-trained and validated reservoir machine is now a stand-alone, self-evolving dynamical system. When the input parameter channel takes on any value in the vicinity of the training parameter values, the system generates sustained chaotic behaviors. The system can generate transient chaos if the input parameter value is in the regime of transient chaos of the original target system. We emphasize that the reservoir machine has never been trained in this regime of transient chaos, i.e., the parameter values in this regime are completely 'new' to the reservoir machine. As the reservoir machine has a high-dimensional hidden states, it is necessary to set not only the initial input vector but also the initial hidden states appropriate for prediction. We use a short period of the real time series from the target system (e.g., several oscillation cycles), taking from the training parameter regime to 'warm up' the neural network.
While the trained reservoir system can generate transient chaos, do the characteristics of the chaotic transients agree with those of the target system in the same parameter regime? To ensure a reasonable agreement, it is necessary to optimize the reservoir system in its ability to acquire the 'dynamical climate' of the target system. We take the following steps. For choosing the values of the seven hyperparameters (k in , k p , p 0 , d, λ, α and β), we repeat the training 800 times with different values of the hyperparameters for optimization. We use the function 'surrogateopt' in Matlab for choosing the values of the hyperparameters from 800 iterations. However, even when the values of the hyperparameters have been optimized, the randomness in the matrices W in , W p and W r will cause fluctuations and errors in the prediction results, as the optimized hyperparameter values determine only a few statistical properties of them and there is still a great degree of freedom in choosing the matrix elements. Inevitably, some realizations of these matrices can make the reservoirs unable in learning the dynamics of the target system. A more detailed discussion and simulation results about the effects of different realizations of the random system matrices can be found in reference [23]. In the present work, we use a simple method to reduce these errors: we conduct the training using five different realizations of these matrices and pick the one with the smallest validation error. The result is that, in the validation phase, with the optimized hyperparameter values and one out of five random realizations chosen, the reservoir computing machine can replicate the true dynamical evolution with small errors (relative errors less than 5%) for at least four or five Lyapunov times.

Machine generated transient chaos in the logistic map
We first use the classic logistic map, x n+1 = ax n (1 − x n ), to demonstrate the emergence of transient chaos in machine learning, where a is the bifurcation parameter. A bifurcation diagram is shown in figure 2, where a critical transition occurs at a c = 4.0 as denoted by the vertical black dashed line. For 3.0 < a < a c , the unit interval (0, 1) is invariant, which contains an attractor together with coexisting non-attracting invariant sets. At a = a c = 4, a boundary crisis [29] occurs, which converts a chaotic attractor into a non-attracting chaotic invariant set. For a > a c , there is transient chaos within the interval (0, 1) that eventually leads to escape to the infinity. Figures 3(a1) and (a2) show the typical dynamical behaviors of the logistic map for a < a c and a > a c , respectively.
We train the reservoir machine at four values of the bifurcation parameter: a = 3.8, 3.85, 3.9 and 3.95, all in the attractor regime where there is a chaotic or a periodic attractor, as indicated in figure 2 by the vertical blue dashed lines. During the validation for each a value trained, the reservoir machine is able to accurately generate the state evolution of the target system for more than 4-5 Lyapunov times. More importantly, the machine generated trajectories for an arbitrarily long stretch of time land on the chaotic attractor.  After training, we impose systematic changes in a and test if the reservoir machine generates transient chaos. An exemplary pair of the machine generated time series for a = 3.8 < a c and a = 4.2 > a c are shown in figures 3(b1) and (b2), respectively, where the reservoir machine correctly generates transient chaos in the latter case. To assess whether the machine generated transient behavior is 'correct' in the sense that it matches with the ground truth, we calculate the return map from the machine trajectories and compare it with the true map. The results are shown in figures 3(c1) and (c2) for a = 3.8 < a c and a = 4.2 > a c , respectively, where the red and black dots represent the machine generated and the true maps. The agreement is remarkable. In particular, for a = 3.99 < a c , there is a green dashed square defining an interval in which a chaotic attractor lies. For a = 4.01 > a c , the invariant set in the unit interval becomes nonattracting and is a fractal, where there is an escaping region outside the green square, leading to transient chaos occurring on the unit interval.
A well trained reservoir machine is capable of generating transient chaos with statistical characteristics matching those of the real system. We examine a fundamental characteristic of transient chaos: the lifetime distribution. In the transient chaotic regime, the true distribution is exponential. As shown in figure 4(a) for a = a c + 0.01, the distribution of the length of the machine generated transiently chaotic trajectories is indeed exponential, where 1500 stochastic realizations of the reservoir system and 1500 random initial conditions for each realization are used. The average transient lifetime from the fitted slope of the data points in figure 4(a) is τ ≈ 28, while that of the real system is about 33. The scaling law of the average transient lifetime τ with (a − a c ) produced by the machine is shown in figure 4(b), which is algebraic: τ ∼ (a − a c ) −γ for a > a c , where γ ≈ 0.62. This agrees with the real scaling law with γ ≈ 0.58. The distribution and expected value of the lifetime of the machine generated chaotic transients, as well as the scaling relation of the average transient lifetime with parameter variation beyond the critical transition point, all agree sufficiently well with the corresponding behaviors of the real system, attesting to the trained capability of the reservoir machine to generate authentic transient chaos, expected from the real system.

Machine generated transient chaos in the classic Lorenz system
We next demonstrate that our parameter-cognizant reservoir computing machine can faithfully generate transient chaos from the classic Lorenz system [30]: where σ = 10, η = 8/3, and ρ is the bifurcation parameter. Figure 5 shows a representative bifurcation diagram, which indicates that a crisis occurs at ρ c = 24.06 (denoted by the vertical black dashed line). An exemplary pair of time series x(t) for ρ > ρ c and ρ < ρ c are shown in figures 6(a1) and (a2), respectively, where there is sustained chaos for ρ > ρ c and transient chaos for ρ < ρ c . We use a fourth order Runge-Kutta method with time step Δt = 0.003 for numerical integration of the Lorenz system.    It can be seen that, not only is the machine able to correctly generate the characteristically distinct behaviors in the pre-critical and post-critical regimes, but the statistical characteristics of the time series are also indistinguishable from the real ones. Figures 6(c1), (c2), (d1) and (d2) show the real return maps and those extracted from the machine generated time series, respectively, in the pre-critical and post-critical regimes,  with reasonable agreement. Especially, in the pre-critical regime, the return map is fully contained in an invariant region in the phase space, whereas in the post-critical regime, a small escaping cusp emerges in the return map. The remarkable result is that the reservoir machine is able to generate these features faithfully.
The reservoir computing machine is also able to generate the correct exponential distribution of the transient lifetime, as shown in figure 7. The average transient lifetime is determined to be τ ≈ 3.0 × 10 2 . Comparing with the true average lifetime τ ≈ 3.2 × 10 2 , we see that the agreement is rather remarkable. The small error is the result of the small size of the escaping region in the return map, which is sensitive to random factors such as the accuracy of reservoir training.

Machine generated intermittency in the classic Lorenz system
Intermittency [31] can be regarded as a special type of transient behavior, where the system switches between two distinct states, spending a finite amount of time in each. We now demonstrate that our parametercognizant reservoir machine can generate intermittency in the chaotic Lorenz system in a parameter region different from the one studied in section 3.2. For σ = 10 and η = 8/3, an intermittency regime arises for larger values of ρ: about ρ = 166. As shown in figures 8 and 9, as ρ increases through a critical point (about 166), there is a transition from a periodic attractor to a chaotic attractor, where intermittency arises after the transition. Here we use a fourth order Runge-Kutta method with time step δ t = 0.002 for numerical integration of the Lorenz system.
We use time series from four values of the bifurcation parameter: ρ = 167.5, 170, 172.5 and 175, all in a chaotic attractor regime that is relatively far from the intermittency regime, to train the reservoir machine, as indicated by the four vertical blue dashed lines in figure 8. The time step of the reservoir system is Δt = 0.01. For each ρ value, the length of the training time series is t = 400 and the length of the time interval for validation is t = 2. (Note that the validation length is shorter than that used in section 3.2 because the maximum Lyapunov exponent of the target system is larger in the present parameter region.) The 'warming up'  Figure 10 shows that the reservoir computing machine is able to generate the intermittent behavior.

Machine generated transient chaos in the Ikeda map model
To demonstrate that our machine learning approach represents an advance over the traditional sparse optimization methods in certain scenarios, we present an example of the Ikeda map model for which the sparse optimization methods fail. In particular, the basic requirement of any sparse optimization technique for finding the system equations is sparsity: when the system equations are expanded into a power series or a Fourier series, it must be that only a few terms are present so that the coefficient vectors to be determined from data are sparse [32,33].
The Ikeda map describes the dynamics of a laser pulse propagating in a nonlinear cavity, which is given by [34][35][36]: where z is a complex dynamical variable, the bifurcation parameter μ is the dimensionless laser input amplitude, γ is the coefficient of the reflectivity of the partially reflecting mirrors of the cavity, κ is the laser empty cavity detuning, and ν measures the detuning due to the presence of a nonlinear medium in the cavity. It is very difficult, if not infeasible, to find a sparse representation of equation (7) with purely observational time series of the system. Thus the sparse optimization methods cannot deal with this system. As we will demonstrate below, our parameter-cognizant reservoir computing scheme can solve this problem. We choose the values of the parameters in equation (7) to be γ = 0.9, κ = 0.4, and ν = 6.0. The parameter μ is the bifurcation parameter. The system exhibits a boundary crisis [37] at μ = 1.0027, as shown by the black vertical dashed line in figure 11. The dynamical behaviors for μ < μ c and μ > μ c are shown in figures 12(a1) and (a2), respectively. There is a chaotic attractor for μ < μ c , and transient chaos leading to an escape of the system out of the previous operation region for μ > μ c .  For each selected value of μ, the training and validation lengths are t train = 800 steps and t validating = 15 steps, respectively. During validation, the reservoir system is able to predict the system evolution for more than 5 Lyapunov times with small relative error.
We train the reservoir machine at μ = 0.91, 0.94, 0.97. Their values are shown in figure 11 by the three vertical blue dashed lines, and they are all in the chaotic attractor regime. After training, we apply parameter change Δμ and test if the reservoir computing machine generates transient chaos. As shown by an exemplary pair of machine generated time series, in figures 12(b1) and (b2), our reservoir approach can successfully To demonstrate the statistical characteristics of the generated transient dynamics, we compare the transient lifetime distribution of both the reservoir generated time series and the time series calculated from equation (7). We set the control parameter as μ = μ * c + 0.02 for the reservoirs, where μ * c is the critical point calculated from each realization of reservoir machine. In total, 50 stochastic realizations of the reservoir system and 400 random initial conditions for each realization are used, and we record the transient lifetimes of these 20 000 trials. The result is shown in figure 12(c), where the distribution of the reservoir computing (marked in red) is very close to the distribution of the real system with μ = μ c + 0.02 (marked in black), demonstrating the power of our reservoir approach for generating transient chaos.

Limited performance analysis of reservoir computing with dynamic climate control
A comprehensive performance analysis of our reservoir computing scheme in terms of its ability to generate transient chaos is infeasible, as there are a large number of 'free' parameters in the system. Here we carry out a limited analysis based of the logistic map, focusing on two issues: the selection of the training parameter points and the effect of noise.

Dependency on training points
We study the dependency of reservoir machine's performance on the values of the training bifurcation parameters. For convenience, we use the parameter difference L between the training parameter value and the critical point to measure how 'far' the former is from the latter. For the logistic map, we use max{a train } to denote the largest value of the bifurcation parameter a used for training (so L = a c − max{a train }) and fix the relative positions of the training points. This way, when there is a small change in the value of L the training values of the bifurcation parameter are shifted by the same amount. During this process, all values of the hyperparameters of the reservoir machine are kept fixed.  figure 13(g). It can be seen that, the error in the machine generated scaling law grows with L, with the estimated average transient lifetime slightly longer than the actual value. As the training set moves further leftward from the critical point, the error in the machine estimated critical point a c increases, leading to an apparent deviation of the scaling law from being algebraic. However, the reservoirs are still able to generate transient chaos even when the training points are away from the transient region. And there is no order of magnitude difference in the machine generated and the actual average transient lifetime.

Effect of noise
We investigate the impact of observational noise on reservoir machine's ability to replicate transient chaos. The noises are of the Gaussian type of amplitude σ and they are added directly to the training data set. Figures 14(a)-(d) show the scaling of the average lifetime τ with (a − a c ) for four different values of σ: 0, 10 −3.5 , 10 −2.5 , and 10 −1.5 , respectively. As the noise become stronger, the machine generated scaling law begins to deviate from the actual one, as expected. The remarkable feature is that the scaling exponent (the slope of the linear fitting on the log-log plot) stays close to the real one, even for relatively large noise amplitude, providing support for the robustness of the performance of the reservoir machine in generating the 'correct' transient chaotic behaviors.

Discussion
Transient chaos is ubiquitous in nonlinear dynamical systems [38]. Here we demonstrate the emergence of transient chaos in machine learning. In particular, by focusing on reservoir computing, a class of RNNs with simple structure, we find that it can generate transient chaos with statistical behaviors that match those of the target system. The training (supervised learning) process makes use of time series data taken from a small number of distinct values of the bifurcation parameter of the target system, during which the machine is an open dynamical system. These training parameter values are 'recorded' by the machine through a particular input channel to the reservoir network. Training is deemed successful when the reservoir machine is able to generate trajectories that stay close to the true trajectories for some reasonable amount of time for each of the training parameter values. After the training, the open loop in the reservoir system is closed and it becomes a self-evolving nonlinear dynamical system. Our main point is that this system can generate transient chaos whose statistical behaviors mimic those of the target system for the same value of the bifurcation parameter. It is worth emphasizing that training is done completely in the attractor regime, and the reservoir system has never been exposed to any transient chaotic behavior. Yet the training has instilled the dynamical 'climate' of the target system into the machine and it gains the ability to generate different dynamical behaviors for different values of the bifurcation parameter, even in the regime of transient chaos, where the dynamics are characteristically distinct from those in the attractor regime.
A basic statistical characteristic of transient chaos is the distribution of the transient lifetime, which is exponential in dissipative dynamical systems. We have demonstrated that an adequately trained machine can faithfully generate this exponential distribution, with the average transient lifetime (the inverse of the exponential rate) agreeing with the actual value but only to within the same order of magnitude. To reduce this error remains a challenge, as it often depends sensitively on the details of the dynamical structure responsible for transient chaos such as the returned map. Especially, in the regime of attracting chaos, there is a region in the return map that is invariant. However, in the regime of transient chaos, an escaping gap emerges in the region, which leads to chaotic transients. The statistical characteristics of the resulting transient chaos depend sensitively on the details of the escaping region, such as its size. While the reservoir machine is able to generate the return map that agrees qualitatively with the true map, there can often be small discrepancies in the detail, especially those around the escaping gap, which can lead to a sizable difference in the average lifetime of the machine generated transient chaos from the actual lifetime.
Another characteristic of transient is the scaling relation between the average lifetime and the parameter difference from the critical point. We have demonstrated that our properly trained reservoir machine can faithfully generate this scaling law even in the presence of observational noise, which is defined entirely in the regime of transient chaos to which the machine has never been exposed.
Taken together, for a nonlinear dynamical system of interest, to develop a machine learning system that is capable of generating transient chaos beyond the attractor regime, where training takes place, has implications to predicting the future dynamical state of the target system [23].