Extreme events in globally coupled chaotic maps

Understanding and predicting uncertain things are the central themes of scientific evolution. Human beings revolve around these fears of uncertainties concerning various aspects like a global pandemic, health, finances, to name but a few. Dealing with this unavoidable part of life is far tougher due to the chaotic nature of these unpredictable activities. In the present article, we consider a global network of identical chaotic maps, which splits into two different clusters, despite the interaction between all nodes are uniform. The stability analysis of the spatially homogeneous chaotic solutions provides a critical coupling strength, before which we anticipate such partial synchronization. The distance between these two chaotic synchronized populations often deviates more than eight times of standard deviation from its long-term average. The probability density function of these highly deviated values fits well with the Generalized Extreme Value distribution. Meanwhile, the distribution of recurrence time intervals between extreme events resembles the Weibull distribution. The existing literature helps us to characterize such events as extreme events using the significant height. These extremely high fluctuations are less frequent in terms of their occurrence. We determine numerically a range of coupling strength for these extremely large but recurrent events. On-off intermittency is the responsible mechanism underlying the formation of such extreme events. Besides understanding the generation of such extreme events and their statistical signature, we furnish forecasting these events using the powerful deep learning algorithms of an artificial recurrent neural network. This Long Short-Term Memory (LSTM) can offer handy one-step forecasting of these chaotic intermittent bursts. We also ensure the robustness of this forecasting model with two hundred hidden cells in each LSTM layer.


Introduction
Human's spontaneous endeavor for understanding the unknown drives innovation by surpassing the limitations. The growing fascination to predict uncertainty benefits us to construct flexible tools for generating feasible explanations behind those puzzles. Despite substantial progress of such forecasting schemes, accurate prediction of uncertain future remains a grand challenge. And this challenge accelerates to few times in case of understanding and predicting extreme events. Extreme events [1,2], in general, occur far off from the mean state (central tendency) of the skewed distribution [3]. Consequently, extreme events appear in the tail of the distribution with comparatively less frequency. Thus, the scientific community has too often dealt with a small number of data extracting information about the unpleasant long-lasting outcomes of extreme events. Recently, dynamical systems [4,5] are being recognized as a valuable contemporary theory to overcome the curse of limited data.
The dynamical systems perspective facilitates propagating a large amount of data by evolving the equations of motions of the system forward in time. The temporal evolution of a suitable observable exhibiting extremely large or small excursions from the long-term average has attracted considerable attention among interdisciplinary studies. These recurrent excursions may resemble extreme events, as their irregular occurrence in time is seemingly random. Motivated by these analogies, many investigations take place, shifting the focus to the single and coupled dynamical systems. Recently, Ray et al. [6] demonstrated the emergence of extreme events under the accumulated effect of distributed damping parameter and repulsive interaction in a globally coupled network of heterogeneous Josephson junctions. Earlier, the underlying generating mechanisms of extreme events in excitable systems of FitzHugh-Nagumo units with different coupling topologies, including global and small-world network, are investigated thoroughly in Refs. [7][8][9]. Occurrence of the dragon-king-like extreme events due to occasional irregular unison firing in two coupled Hindmarsh-Rose bursting neurons is inspected in Ref. [10]. The role of edges in the origination of extreme events is scrutinized in Refs. [11,12]. However, the study of extreme events has been less explored in coupled maps.
The statistical properties of extreme events in deterministic dynamical systems are analyzed in the seminal work [13]. Ray et al. [14] successfully applied a threshold-activated coupling [15] to suppress extreme events in two coupled Ikeda maps. The sudden relatively rare and significantly large explosive growth of population densities is recently reported in a network of coupled chaotic Ricker maps [16]. Except for these few pioneering works, most of the earlier studies on extreme events deal with only continuous dynamical systems [17][18][19][20][21][22] to the best of our knowledge. In this article, our results reveal the emergence of extreme events in globally coupled chaotic maps due to on-off intermittency [23]. Cavalcante et al. [24] previously reported this mechanism of causing extreme events. On-off intermittency is also identified as the principal process for the genesis of extreme events in temporal networks of mobile agents [25,26] under the influence of attractive-repulsive co-existing interactions [27][28][29][30].
Quite astonishingly, the coupled maps have fascinating, realistic applications, including stock markets [31], fluid dynamics [32], quantum field theories [33], optical systems [34], and many more. Coupled map lattices [35][36][37][38][39][40] can offer a plethora of emerging dynamical behaviors, ranging from complete synchronization [41][42][43], splay state [44], chimera [45,46] to clustering [47,48] and spatiotemporal intermittency [49] to name but a few. We identify extreme events in the globally coupled maps near the synchronization manifold and utilize the benefits of the machine learning approach to predict the dynamics incorporating extreme events. Recently, model-free prediction of chaotic dynamics has become one of the mushrooming avenues of research that offers considerable benefits in forecasting extreme events without knowing the explicit equations of the model. Reservoir computing, one of those effective tools, can be applied to indicate early warning signals of extreme events successfully, as recently highlighted by Pyragas et al. [50]. The performance of the neural networks for predicting chaotic dynamics of the two-dimensional Hénon map is investigated rigorously in Ref. [51]. Several other data-driven methods enjoy widespread appreciation in the current literatures [52][53][54][55][56][57][58] for forecasting extreme events. Our study involves a special kind of Recurrent neural networks (RNN) [59], viz. Long Short-Term Memory (LSTM) [60][61][62][63][64][65][66][67] and unveils its excellent flexibility to capture the dynamics of upcoming extreme events optimally.
Elman RNNs [68] require repeated multiplication of the gradient with the weight matrix during backpropagation through time [69]. This leads to the vanishing or exploding gradients problem depending on the recurrent weight matrix's negative or positive spectral radius. Besides, a vanilla RNN cannot remember long-term dependencies of data over a long time. The limitations of RNNs fuel the discoveries of LSTM [60], which can yield one-step-ahead of prediction, disregarding the problems of speed and stability in RNNs. Apart from multiple practical applications ranging from short-term traffic prediction [70] to handwriting [71] and speech [72] recognition, LSTM is recently found to be fruitful in detecting extreme events at Uber data [73] and flood forecasting [74]. In what follows, we are interested in exploring the emergence of extreme events in globally coupled chaotic maps. The necessary details of the model and the mechanism behind the formation of extreme events are the contents of Sec. 2. Section 3 is devoted to the statistical studies of extreme events. We discuss the range of coupling strength to provide a clear separation between the non-extreme and extreme events in Sec. 4. We then proceed with the accurate one-step-ahead prediction of extreme events in Sec. 5, whereas we summarize and conclude with a discussion of our findings in the last Sec. 6.

Mechanism involved in the formation of extreme events
The coupling through the mean-field in an ensemble of identical coupled oscillators can provide fascinating outcomes like explosive death [75] and synchronous state [76]. We consider the following equations describing the dynamics of N(≥ 2) globally coupled one-dimensional maps [77]. Here, x n (i) reflects the one-dimensional real state vector of the i-th map at the n-th iteration. The dynamics of each map is given by the C 1 smooth one-dimensional nonlinear function f : R → R.f n = 1 N ∑ N j=1 f (x n ( j)) depicts the arithmetic mean. The coupling parameter ε lies within the compact interval [0, 1]. However, ε = 0 describes N isolated maps. On the other hand, each map evolves synchronously after the first iteration for ε = 1. To determine the range of ε for the global stability of the synchronous state, we define the following function for any pair of k-th and l-th maps at the n-th iteration as, (2) Clearly, this function is non-negative and it will vanish only when the k-th and l-th maps evolve to the same motion at the n-th iteration. Thus, if we can show that this function is monotonically decreasing, then it will converge to the greatest lower bound (infimum), as the function (2) is bounded below by 0. The criteria of asymptotic global stability, hence, reduces to V n+1 (kl) < V n (kl). For large N, one can easily derive that the system (1) asymptotically synchronizes [40] if where A = sup f = 0 represents the least upper bound (supremum) of f (x). We further perform linear stability analysis for the complete synchronization state. The Jacobian matrix is J = J 0 f , where f is the derivative at the synchronous state and The eigen values of J 0 are µ 1 = 1 and µ j = 1 − ε for j = 2, 3, · · · , N. The eigenvector ψ 1 corresponding to the eigen value µ 1 represents the synchronization manifold and hence, the amplification of the perturbation inside the onedimensional invariant manifold {(x(1), x(2), · · · , x(N))|x(1) = x(2) = · · · = x(N)} along this eigenvector does not destroy the coherence. The other N − 1 identical eigen values help in determining the stability of the coherent state, as their corresponding eigen vectors ψ j for j = 2, 3, · · · , N evolve transverse to the synchronization manifold. With the help of the Lyapunov exponent λ 1 of the uncoupled map, the linear stablity analysis provides the critical condition [47] as follows, Clearly, the synchronized state is stable for ε > ε c after a sufficient number of iterations (transient). In order to verify our findings, we choose the classical logistic map f (x) = We plot Fig. (1) for ε = 0.4995 such that the difference between chosen ε and ε c remains very small. All figures of this article, including this one, are plotted for a fixed set of initial conditions randomly distributed in the uniform interval [0, 1]. All necessary fundamental codes along with this set of initial conditions utilized in this article are available at [78]. However, we verify that the results will remain unchanged for any other random choice of initial conditions for each map within the interval [0, 1]. The choice of this ε = 0.4995 will not allow these N = 200 coupled maps to settle down to a complete synchronized state as expected through our stability analysis. Figure (1) (a) reveals the partial synchronization [48,79] state of the system, where the entire system (1) splits into two groups. Particularly for the chosen set of initial conditions, one group contains exactly N 1 = 3 elements, and the other one possesses N 2 = N − N 1 = 197 members. These numbers N 1 and N 2 depend on the choice of random initial conditions. All figures are inspected after 3 × 10 5 iterations. We plot the dynamics of each logistic map in Fig. (1) (b), which demonstrates the temporal chaotic behavior of each cluster. Careful scrutiny of this subfigure unfolds the complex behavior of these two clusters. The distance between these two groups is not same in the entire time domain.
When any one of p 1 or p 2 proceeds to the limit 0, then the other one tends to 1. For instance, if p 1 → 0+, then p 2 → 1−. Thus for p 1 = 0, Eqns. (5) govern the following equations, Figure 1: (Color Online) (a) Advent of two synchronized clusters in the globally coupled maps: We choose the coupling strength ε = 0.4995 slightly lesser than the critical coupling strength ε c = 0.5. This ε c is necessary for achieving complete synchronization of the system (1) with N = 200 as per Eqn. (4). The system (1) settles down to a partial synchronized state. The number of elements in each cluster varies depending on the initial conditions. (b) Time evolution of each logistic map: We use the identical logistic maps f (x) = rx(1 − x) as the local dynamics with r = 4. Each map reflects a chaotic behavior. Although those oscillators occasionally collapse to a single trajectory, mostly they possess a two-cluster state, as observed in the snapshot at a specific iteration 447855 in subfigure (a). (c) Onoff intermittency of the observable E: We select two different maps, viz. 1-st and 50-th oscillators from two distinct clusters using subfigure (a) and calculate the Euclidean distance E between them. The temporal evolution of the observable E contemplates chaotic intermittent bursts which cross the significant height H S = E n + 8σ from time to time. E n is the sample mean of the local maxima E n of E gathered over sufficiently long 10 8 iterations after the initial transient. σ is the standard deviation of the whole simulated data. We will assess an event E n as an extreme event if it crosses the red horizontal dashed line H S . This figure is drawn for a particular set of randomly distributed initial conditions within the interval [0, 1]. We consider 3 × 10 5 iterations as the transient. For further details, please see the text. (1)) + ε f (y n (2)), and y n+1 (2) = f (y n (2)).
Noticeably, y(2) does not depend on the variable y(1) for p 1 = 0, although y(1) depends on both the variables y(1) and y (2). Thus, these equations in the asymmetric cluster limit possess the skew product structure [80].
We define a new variable E = y(1) − y(2) describing the Euclidean distance between two clusters. Figure (1) (c) displays the temporal dynamics of E after the initial transient. E remains near zero (under the numerical accuracy) most of the iterations revealing the synchronization of the two clusters. However, this off-state (E ≈ 0) is interrupted occasionally, leaving the trait of large amplitude random bursts. This non-zero value of E is so high that it crosses the pre-defined threshold H S (shown in dashed red line in subfigure (1) (c)) infrequently. This H S = E n + 8σ is treated here as the extreme event qualifying threshold [26], where E n is the local maxima of E after the initial transient of 3 × 10 5 iterations. We accumulate the values of E over 10 8 iterations. Then, we collect the values of E n excluding the transient data. E n represents the sample mean of this gathered data of E n , and σ is the corresponding standard deviation. The choice behind this threshold H S is motivated by the fact that the extreme events qualifying threshold is generally taken as E n + dσ , where d ∈ R varies within the interval [4,8] [6,10,17,[19][20][21]25,26]. Evidently, we choose the upper bound 8 of d, as it definitely portrays more relatively scattered events with strikingly higher amplitude and lower frequency. Moreover, Ref. [26] presented an analytical formulation for this particular choice of threshold, and designated this threshold as the significant height in analogy with the oceanic rogue waves [81]. We employ this significant height H S borrowing the idea from Ref. [26] and whenever E n crosses H S , we recognize it as an extreme event.
We rigorously investigate that these sporadic bursts of E are not due to the appearance of floating nodes [37]. Once a node joins a synchronized cluster, it will stay there forever and will never move to the other cluster. This intermittent bursts are completely due to the varying distance of the two clusters. We choose two distinct maps from two different clusters and study the temporal evolution of their Euclidean distance E in Fig. 1 (c). For further evaluation of extreme events, we investigate their statistical properties in the following section.

Statistical information of extreme events
Generally, Gaussian (normal) distribution, being symmetrically distributed about the mean, indicates undoubtedly that the data near the central tendency are comparatively more frequent in occurrence than the data far from the expected value. We assemble the data of local maxima E n of E over a considerable interval of 10 8 iterations after leaving the initial transient of 3 × 10 5 iterations. We plot the probability density function (PDF) of E n in Fig. 2 (a). We choose the same set of initial conditions and coupling strength ε = 0.4995 as in Fig. 1  We accumulate local maxima E n > 0 of E over sufficiently long 10 8 iterations after repudiating the first 3 × 10 5 transients. We calculate H S over these data and draw this to identify the separation between non-extreme and extreme events. (b) Probability density function of extreme events: Generalized extreme value distribution fits well with the data from the right-hand side of the significant height H S in subfigure (a). We use log likelihood to estimate the three parameters of the family given in Table (1). (c) The distribution of return intervals for the extreme events within each burst: We numerically investigate the successive manifestation of extreme events within each burst. For numerical simulation, once we detect an extreme event, we throw away subsequent extreme events (if it exists) from next 100 iterations. This data is well fitted with Weibull distribution. The estimated scale parameter is a = 6782.39 and the estimated shape parameter is b = 0.721932. We also use the popular P-P (probabilityprobability) plot (not shown here) to confirm our claim. The subfigures (a-c) are drawn with ε = 0.4995 for the N = 200 globally coupled identical logistic maps The probability of appearance of extreme events: We compute the probability of occurrence of extreme events for 7 different values of ε using the relation (12) and successfully validate it by plotting the upper bound using the inequality (11). Here, ε c = 0.5. We use the same set of random initial conditions drawn from [0, 1] for all results.
H S contains all non-extreme events, while the right-hand side encompasses all extreme events. Clearly, the PDF of E n follows the non-normal distribution as a fair portion of data deviates more than 8σ from the mean E n . In addition, we use the statistical Kolmogorov-Smirnov test (K-S test) by assuming our simulated data are identical with the Gaussian distribution. As anticipated, the K-S test rejects the null hypothesis at the default 5% significant level. These findings validate the signature of extreme events employed in Refs. [25,26,82,83]. More or less, the common notion of definition of extreme events in these Refs. [25,26,82,83] divulges the following characteristics, viz.
(i) these events are aperiodic, possessing amplitude higher than H S = E n + 8σ , and (ii) these recurrent events occur less frequently, maintaining the non-Gaussian distribution.
In this article, we also maintain these two important properties in case of defining extreme events. We also represent the PDF of the amplitude of the extreme events in Fig. 2 (b). We draw the subfigures (a-c) of Fig. 2 using the MATLAB Distribution Fitter App. Our numerically simulated data (shown in cyan color) in Fig. 2 (b) is the same data, lying in the right of the significant height H S in Fig. 2 (a). We utilize the Freedman-Diaconis rule [84] for choosing the width of the bins in this histogram. We find this data are well fitted with the PDF of the generalized extreme value (GEV) distribution [85] given by for β = 0 and 1 + γ x − α β > 0. Here, α, β > 0, γ = 0 stand for the location, scale, and shape parameters, respectively. As GEV distribution is a combination of three families (Gumbel, Fréchet and Weibull), we get Type-II extreme value distribution for γ > 0. In contrast, we obtain the Type-III case for γ < 0. For γ = 0, we have representing Type-I case. Table (1) contains all the necessary information about the estimated parametric values of α, β , and γ for Fig. 2 (b). GEV distribution is applicable in various fields [85], including it is the only possible limiting distribution (if it exists) of an adequately normalized maximum of a sample of independent and identically distributed random variables.
We also study the statistics of the return intervals between extreme events in Fig. 2 (c). Recurrence interval has a broad range of applications in hazard estimations like the interarrival packet times on internet traffic [86], earthquakes [87], floods [88], and many more. We select the extreme events from each burst and collect the return interval between two consecutive such extreme events. For numerical implementation, once we detect an extreme event, then we 2636. This dimensionless number being greater than unity reveals a high variance of the data. Moreover, the CV of an exponential distribution is precisely 1, which is less than the CV of our return intervals. Hence, we conclude that the return interval distribution of extreme events does not follow the exponential distribution.
Earlier studies [89][90][91] suggest these return intervals are generally approximated to any one of the following distributions, viz. (i) exponentially distributed, (ii) stretched exponential, and (iii) Weibull distribution. Memoryless is a vital property of the exponential distribution of non-negative real numbers. In contrast, long-term memory usually gives rise to a stretched exponential distribution of the return intervals [90]. Recently, Weibull distribution is also found to be a good representative for the distribution of recurrence time intervals of longrange correlated data [89]. The PDF plotted in the semi-log scale resembles of a product of a stretched exponential and a power law. This PDF in Fig. 2 (c) is well fitted by the Weibull distribution. Estimation of the scale and shape parameter is given in  Here, a is the scale parameter and b is the shape parameter. The flexibility of the Weibull distribution allows it to model in an astonishingly diverse array of data sets, ranging from weather forecasting to reliability engineering and failure analysis. The PDF of the two parameter Weibull distribution is given by Although E remains zero for a large number of iterations as illustrated through Fig. 1 (c), E n , being the local maxima of E, is positive-valued observable. Evidently, H S = E n + 8σ is also positive quantity and varies with each different choice of initial conditions, even for fixed ε and N. We come up with an upper bound for the probability of occurrence of extreme events using Markov's inequality [92] as follows We compute the probability of occurrence of extreme events over the interval [0.4985, 0.4999] of the coupling strength ε with the same set of initial conditions used in Fig.  1. To calculate its probability, we iterate the system 1 with N = 200 identical logistic maps for 10 8 iterations and then discard the initial 3 × 10 5 data treating them as transient. Then, we calculate the number of local maxima E n of E over that sufficiently long data. After that, we simulate the significant height H S in each case and count the number of extreme events, i.e., E n which cross H S . Hence, we apply the classical definition of probability as follows Probability of occurrence of extreme events = Number of extreme events Number of E n .
We plot these values for ε c − ε in Fig. 2 (d) along with the upper bound using the relation 11. Here, ε c = 0.5 is the required critical coupling strength for the emergence of complete synchronization as calculated through the Eqn. (4). One should notice that the upper bound proposed in the inequality 11 is not, by any means, the least upper bound. Even though we cannot observe any clear trend between the numerically calculated probabilities, one can anticipate that as the distance between ε c − ε increases, the chaotic bursts of E become more intermittent due to the instability of synchronization invariant manifold {(x(1), x(2), · · · , x(N))|x(1) = x(2) = · · · = x(N)}. Consequently, these sporadic bursts become more frequent and become less likely to cross the threshold H S . We will discuss this result in detail in the next section.

Range of coupling strength for the emergence of extreme events
We contemplate the dynamical behavior of the system (1) of N = 200 identical logistic maps in Fig. (3) with the same initial conditions as used in all previous figures. This inspection will help us to determine the range of ε corresponding to extreme events. Figure (3) (a) reflects a one-dimensional parameter range of the coupling strength ε, where we identify numerically four distinct behaviors. For ε < ε ch ≈ 0.43303012 (dark blue interval of Fig. (3) (a)), the system settles down to a two cluster state. The behavior of each map is periodic in time, and accordingly, the distance between two clusters E is also periodic. Hence, we never expect the manifestation of extreme events in this range. The system falls in a chaotic two cluster state within the interval (ε ch , ε c ). The purple regime of ε > ε c in Fig. (3) (a) gives rise to a coherent chaotic state, as anticipated from our stability analysis. However, we fail to provide an analytical estimation to decide the range of ε exhibiting extreme events. That's why we examine its range through numerics. Our numerical study suggests extreme events arise only We split an interval of ε over four distinct groups. Dark blue interval of ε < ε ch exhibits periodic two clusters, whereas the remaining interval contains a twocluster chaotic attractor for (ε ch , ε c ). The orange interval (ε ch , ε EE ) displays the interval encompassing non-extreme events. We come up with the subfigure (b) for ε = 0.496, where the red dashed horizontal line is the significant height H S = E n + 8σ . Evidently, E flunks to cross this threshold H S . Nevertheless E crosses H S intermittently for ε = 0.4999 in subfigure (c). The black interval (ε EE , ε c ) of subfigure (a) incorporates the interval of extreme events. The purple interval of ε > ε c = 0.5 presents the coherent chaotic attractor. For simulation, we consider N = 200 globally coupled maps with identical logistic maps f (x) = 4x(1 − x) as the local dynamics. For further details, please see the text.
within the interval (ε EE , ε c ), where ε EE ≈ 0.4961745. The interval (ε ch , ε EE ) yields the two cluster chaotic attractor, but their Euclidean distance E becomes too frequent to cross H S . To reveal this nature, we choose ε = 0.496 from the non-extreme (orange) range of Fig. (3) (a) and select the 1-st and 2-nd maps from two different clusters. The difference between these two maps is plotted in Fig. (3) (b). Clearly, the intermittent random bursts of E do not cross the extreme event qualifying height H S (red dashed horizontal line) in this figure. Similarly, we set ε = 0.4999 from the interval (black) of occurrence of extreme events in Fig. (3) (a). The system again falls in a partially synchronized state, and we choose two different maps, viz. 98-th and 99-th maps corresponding to two distinct clusters. The distance between these two maps occasionally crosses the significant height H S in Fig. (3) (c). The aperiodic random bursts attest to the emergence of extreme events.
One should emphasize that although we plot the open interval (ε ch , ε c ) continuously by the black solid line in Fig. (3) (a). However, one may find a set of Lebesgue measure zero within this interval, which contains few points of coupling strength that unable to produce extreme events. One such point is ε = 0.49775. For this choice of ε, we accumulate sufficiently long data of E n over 10 8 iterations after discarding the initial transient 3 × 10 5 . We observe that these long iterates of E n fail to cross the derived H S . We draw this Fig. (3) with the same set of initial conditions as used in Fig. (2). However, we verify the results remain unchanged if we choose other random initial conditions of each element from the uniform interval [0, 1]. Although, other initial conditions may slightly change the value of ε ch and ε EE . Lastly, we want to conclude this section with the fact that this appearance of extreme events is a generic signature in globally coupled chaotic maps, and our choice of the identical one-dimesional logistic map as the chaotic local map is only a paradigmatic example for illustrating the phenomena. The intermittent transition [39,47,49] from the synchronized state to a two cluster attractor always allows a range of coupling strength for the origination of extreme events near the synchronization manifold in the system (1) consisting of onedimensional chaotic maps.

Short-term prediction of extreme events
After the statistical characterization of extreme events, we focus on their prediction utilizing Long Short-Term Memory, a deep learning architecture. The recurrent neural nets, viz. long short-term memory (LSTM) networks used in this article is found to be capable of one-stepahead forecasting of chaotic time series. LSTM cell can process data sequentially and keeps its hidden state through time. Using the univariate time series, we create the regression data set {X, Y} by taking a lag of 1 (for example, if X n be the actual time iteration, then Y n = X n+1 , n = 1, 2, · · · , sample size). And using the training data set {X, Y}, we build LSTM net so that we can deal with the vanishing gradient problem that can be encountered when training traditional RNNs. Thus, it is important to note that the dimension of the feature vector is one. So, sequence input with one dimension is passed in LSTM with 200 hidden units in a fully connected layer. Here, the loss function used to compute the weight matrix connecting different layers is a mean-squared error. We iterate the system (1) with ε = 0.4995 for 10 8 iterations and collect the distance E between the two clusters after deleting first 3 × 10 5 transient. Then we calculate the local maxima E n of this whole gathered data of E. We use the first 165000 data points of E n consisting of few extreme events, out of which 90% data are used for the training. The remaining 10% data points are treated as a testing set. The schematic sketch of neural architectures is shown in Fig. (4) (a), where each layer comprises 200 nodes for the LSTM. This number of hidden cells in each LSTM layer is determined using the cross-validation technique [93]. E n+1 = F o h (h n ) is the prediction for the observable E n+1 depending on the input observable E n ∈ R d o .
h is the hidden to hidden mapping, which is given by the following equations Here, is the element-wise product (Hadamard product). Each LSTM is composed of three gates to protect and control the cell state, which carries the sequence input initially. g forget n , g input n , g output n ∈ R d h represent the activation vectors of respective gates. ξ forget , ξ input , ξ cell and ξ hidden are the bias vectors. Σ f orget , Σ input , Σ hidden are the sigmoid functions. The weight matrices W forget , W input , W cell , W hidden ∈ R d h ×(d h +d o ) along with the bias vectors need to be learned during training. The forget gate decides how much information should be appended in the final product of the cell state at each iteration n. This final product, i.e., the hidden state, is treated as the input of the next layer. The average of these outcomes across all hidden states is the final predicted output. The second last relation of Eqn. (13) suggest that depending on Σ hidden , we get an output gate's activation vector g output n .
Hyperbolic tangent (tanh) function acts on the cell state generating a value between −1 and 1. The last relation of Eqn. (13) will produce the hidden state, which will give the prediction E n+1 . This forecast E n+1 is a linear function F o h given by the following relation where W o ∈ R d o ×d h . One such LSTM neuron is shown in Figs. (4) (b-c). For a detailed description of LSTM networks, please see the pioneer work of Hochreiter et al. [60].
This LSTM network can give rise to the one-step-ahead future prediction depending on each time step of the input sequence as illustrated in Fig. (4). We plot the predicted output and the test data simultaneously for a short horizon in Fig. (5) (a). To determine the accuracy of the convergence between these two data sets, we calculate the arithmetic mean of the squares of the Euclidean distance between E and E as follows where e i is the difference between the forecasted outcomes and the actual value. e is the l × 1 matrix and l denotes the number of data points in the testing data. This loss function is a positive valued metric tending to zero for an accurate forecast. The mse for Fig. (5)  Here, a segment of time iterations (magenta) is displayed, where values of local maxima E n (blue cross) are termed as actual data. We perform the one-step-ahead prediction and denote the predicted values corresponding to E n by the black circle. This framework is also able to forecast the amplitude of extreme events besides non-extreme events. Here, non-extreme events depict those events with E n ≤ H S . The initial conditions are the same as Fig. (1), and the actual data are simulated using the system (1) with N = 200 identical logistic maps f (x) = 4x(1 − x).
with l = 16499 is 0.000011589. However, Ref. [65] criticizes this mean squared error due to non-normalization for the variability of the data. Hence, we use a different measure as follows where E is the mean of the observed data. This measure is widely known as the In fact, we contemplate the robustness of the used deep learning technique in Fig. (6). We define the following equations Here, h is the number of trials. mse i and nse i are the respective mse and nse at the i-th realization. We take independent h = 30 realizations and delineate the average mse with their corresponding standard deviation σ mse in Fig. (6). nse for these 30 independent trials is 0.81856. It is evident that this nse is higher than the nse of Fig. (5). Figures (5) and (6) reflect the successful one-step forecasting of the local maxima E n of E with sufficiently moderate accuracy. These data sets possess few extreme events crossing the dashed line H S as shown in Fig. (5). We find our trained model agrees well with the observed chaotic data consisting of extreme events.

Discussion and Conclusion
Recent decades have seen many efforts to understand the mechanisms behind the recurrent emergence of extreme events in dynamical systems and complex systems. Attainment of giant extraordinary values in a short time of a relevant observable is generally known as an extreme event. These recurrent short-lasting events deviate significantly from the longterm average of their statistical distribution. Due to its practical importance, this topic gains widespread recognition among interdisciplinary researchers. In spite of such leaps of progress, these investigations provide only a partial understanding of the mechanisms of extreme events and their predictions. The scientific community of nonlinear dynamics brings few successful signs of progress by considering investigations through models following differential equations. However, most of these studies are performed by contemplating continuous dynamical systems. Theoretical investigations on maps for an understanding of the dynamics of extreme events are not well explored yet. Only a few studies are reported on discrete dynamical systems [13,14,16] along this line to the best of our knowledge.
Moreover, most of the existing studies on globally coupled networks require repulsive interaction [6,25] for the appearance of extreme events. The co-existence of attractiverepulsive coupling of suitable strength is necessary for the emergence of extreme events in globally coupled identical Stuart-Landau oscillators as per the numerical investigations of Ref. [25]. Ray et al. [6] also observed the signature of extreme events under repulsive meanfield interaction in a network of heterogeneous oscillators. Although, Ansmann et al. [7] studied such "self-generating and self-terminating" events under solely attractive coupling in globally diffusively coupled inhomogeneous FitzHugh-Nagumo units. But, parameter mismatch plays a crucial role in the formation of extreme events there.
We have investigated the emergence of extreme events in a globally coupled network of one-dimensional chaotic maps. We have calculated the critical coupling strength ε c , independent of the system size N, for achieving a synchronized chaotic state. Before this coupling strength, the system settles down to an asymmetric two-cluster attractor. The distance E between these two clusters occasionally travels far away and possesses some large values. These intermittent bursts of E are so large that they infrequently deviate more than eight times the standard deviation from the mean of the local maxima E n of E for a suitable range of coupling strength. These highly deviated values of E are classified as extreme events if they cross the significant height H S . The reason behind this specific choice of threshold is supplemented using the existing literature [26]. We have numerically determined the range of ε for the occurrence of extreme events in Sec. (4). Generalized extreme value distribution further corroborates the statistical signature of these events. We have evaluated the probability of the appearance of extreme events for some specific values of ε using Eqn. (12) and the procedure given in Sec. (3). For instance, this probability is 0.0027 for the system (1) with N = 200 coupled identical logistic maps f (x) = 4x(1 − x) and ε = 0.4995. These results confirm that the appearance of extreme events is very fewer in frequency. The return time distribution of these extreme events resembles of Weibull distribution.
Understanding of extreme events brings to the limelight lots of open queries. The interdisciplinary research of complex systems plays a decisive role in providing valuable insights into these long-standing issues. We have contemplated the class of globally coupled one-dimensional chaotic maps hoping that our analysis may enhance the understanding of these non-equilibrium phenomena. This article has several folds with insightful results, including (i) the mechanisms triggering extreme events (see Sec. (2)), (ii) the statistical aspect of extreme events (see Sec. (3)), and (iii) the prediction of extreme events (see Sec. (5)). We analyze these three essential themes in the studies of extreme events by placing the paradigmatic one-dimensional chaotic identical logistic maps in each node of the globally coupled network. The mechanisms of formation of extreme events may unravel in gaining insightful information regarding the onset of intermittency transition for globally coupled chaotic maps. Additionally, this route of two-state intermittent nature may unfold the possibility of mitigation of extreme events within chaotic dynamical systems. By tuning the coupling parameter, one can easily avoid such occasional giant events (see Sec. (4)). And this mitigation strategy is only possible due to our precise knowledge of the mechanism viz. self-organized on-off intermittency triggering extreme events. In addition, statistical studies are promising tools that help in drawing inferences regarding the frequency and probability of occurrence of extreme events from the samples [94]. Nevertheless, knowledge of the mechanism behind the generation of extreme events may not always help in predicting extreme events. That's why we take advantage of the machine learning approach to predict the dynamics incorporating extreme events. There are several examples of model-free predictions such as one-day-ahead forecasting electricity consumption [95,96], forecasting hourly solar irradiance [97], predicting El Niño events with a lead time of one-year [66] to name a few. The deep learning strategies are capable of learning the essential chaotic dynamics directly from data and can deal with this grand undesirable challenge. We are successfully able to predict the chaotic dynamics of E n consisting of extreme events with moderate accuracy. We hope that the perceived study may reveal a thorough understanding of this crucial phenomenon. The employed trained LSTM predictors have predictive power and practical applicability within the scope of viable forthcoming research and might help to comprehend the forecast of extreme events in realistic scenarios.