Statistical physics approaches to the complex Earth system

Global warming, extreme climate events, earthquakes and their accompanying socioeconomic disasters pose significant risks to humanity. Yet due to the nonlinear feedbacks, multiple interactions and complex structures of the Earth system, the understanding and, in particular, the prediction of such disruptive events represent formidable challenges to both scientific and policy communities. During the past years, the emergence and evolution of Earth system science has attracted much attention and produced new concepts and frameworks. Especially, novel statistical physics and complex networks-based techniques have been developed and implemented to substantially advance our knowledge of the Earth system, including climate extreme events, earthquakes and geological relief features, leading to substantially improved predictive performances. We present here a comprehensive review on the recent scientific progress in the development and application of how combined statistical physics and complex systems science approaches such as critical phenomena, network theory, percolation, tipping points analysis, and entropy can be applied to complex Earth systems. Notably, these integrating tools and approaches provide new insights and perspectives for understanding the dynamics of the Earth systems. The overall aim of this review is to offer readers the knowledge on how statistical physics concepts and theories can be useful in the field of Earth system science.


The Earth as a Complex System
The Earth behaves as an integrated system comprised of geosphere, atmosphere, hydrosphere, cryosphere as well as biosphere components, with nonlinear interactions and feedback loops between and within them [1]. These components can be also regarded as self-regulating systems in their own right, and further broken down into more specialized subsystems. Nevertheless, the growing understanding of the multi-component interactions between physical, chemical, biological and human processes suggests that one should bring different disciplines together and take into account the Earth system as a whole. Such studies and results initiate the emergence of a new 'science of the Earth'-Earth System Science (ESS) [2]. The ESS framework has already demonstrated its potential as a powerful tool for exploring the dynamical and structural properties of how the Earth operates as a complex system.
The ESS has emerged in the early to mid-20th century, and has developed rapidly during the last decades. Its historical evolution can be briefly outlined into four phases [2]: (i) Precursors and beginnings (pre-1970s). The systemic nature of Earth was mainly described and emphasized by some conceptualizations, such as VernadskyâĂŹs biosphere concept that life has a strong influence on the physical and chemical properties of Earth [3], and LovelockâĂŹs Gaia hypothesis that Earth is a synergistic and self-regulating, complex system [4]. These conceptualizations play vital roles in the contemporary understanding of the Earth system. In particular, the International Geophysical Year (IGY) 1957âĂŞ1958 promoted the development of international science and led to the emergence of two contemporary paradigms, modern climatology and plate tectonics [5,6]. (ii) Founding a new science (1980s). In the 1980s, the newly emerging recognition of Earth as an integrated entity: the Earth system, was called for by a series of workshops and conference reports. In particular, the Bretherton diagram developed by the National Aeronautics and Space Administration (NASA) [7] was the first systemsâĂŞdynamics representation of the Earth System to couple the physical climate system and biogeochemical cycle. The Brundtland report published by the World Commission on Environment and Development in 1987 [8] developed guiding principles for a sustainable development and recognized the importance of the environmental problems for the Earth system. Some international organizations were also established in this stage, for example, the World Climate Research Programme (WCRP) aims to determine the predictability of the climate and the effects of human activities on the climate. The International Geosphere-Biosphere Programme (IGBP) was launched in 1987 in order to coordinate international research on global-scale and regional-scale interactions between Earth's biological, chemical and physical processes and their interactions with human systems. The Intergovernmental Panel on Climate Change (IPCC) was created in 1988 to provide policymakers with scientific assessments on climate change, its implications and potential future risks, as well as to put forward adaptation and mitigation options. (iii) Going global (1990sâĂŞ2015). Due to great research efforts of international programmes such as the IGBP, and the widespread use of the Bretherton diagram, ESS then developed rapidly, from the 'new science of the Earth' movement to a global one. In particular, the International Human Dimensions Programme (IHDP) on Global Environmental Change was founded with the aim to frame, develop and integrate social science research on global change. (iv) Contemporary ESS (beyond 2015). In the 21st century, the ESS framework was well established, and initiated a new programme, Future Earth, which was integrated by several international organizations or programmes, such as the IGBP and IHDP. Future Earth's mission is to accelerate transformations to global sustainability through research and innovation. Following Ref. [1], we highlight the key organizations, publications, campaigns and events that characterize well the evolution of ESS in Fig. 1.
Among the pioneering publications for the evolution of ESS, Hans Joachim Schellnhuber introduced and developed a fundamental concept for ESS: the dynamic, co-evolutionary relationship between nature and human factors at the planetary scale [9]. He proposed that the Earth system E can be conceptually represented by the following mathematical form: where N = (a, b, c, . . .); H = (A, S). This model suggests that the overall Earth system contains two main components: N stands the ecosphere and H is interpreted as the human drivers. N is composed of an alphabet of intricately interacting planetary sub-spheres a (atmosphere), b (biosphere), c (cryosphere), etc. The human factor H is much more subtle: A means the 'physical' sub-component and the 'metaphysical' sub-component S reflects the emergence of a 'global subject'. This work provided the conceptual framework for fully integrating human dynamics into an Earth system and built a unified understanding of the Earth. There exist numerous tools and approaches that support the evolutionary development of the ESS. However, it is worth noting that they can be integrated into three interrelated foci: observations, modeling and computer simulations, assessments and syntheses [2]. The Earth observations are usually referring to the instrumental data, for example, that are collected by the meteorological stations or polar orbiting and geostationary satellites. However, such data sets only extend, at best, for about one-to-two centuries into the past. In order to extend our reach beyond thousands or even millions of years ago, climate proxies fill this gap. There are multiple proxy records, including coral records [10], marine-sediment [11], stalagmite time series [12], tree rings [13] as well as the Vostok [14] and EPICA [15] ice core record that help us to better understand the past Earth system. Mathematical models are currently key methods for the understanding and projecting of climate and Earth systems. These models include different variants from conceptual climate models-Energy Balance Models (EBMs) [16] to more complex Earth models, the General Circulation Models (GCMs) [17]. In addition, the integrated assessment models (IAMs) were designed to take human dynamics as an integral component into account and aim to understand how human development and societal choices affect each other and the natural world, including climate change [18,19]. Based on the GCMs, the Earth systems Models of Intermediate Complexity (EMICs) were developed to investigate the Earth's systems on long timescales or at reduced computational cost [20]. These models provide knowledge-integration to explore the dynamic properties and basic mechanisms of the Earth system across multiple space and time scales. The assessments and syntheses also played crucial roles in building new scientific knowledge, linking the scientific and political communities, and facilitating new research fields based on the feedback from the political sector.
Some new concepts and theory have indeed arisen from the evolution of ESS, including the emerging concept of sustainability [21], Anthropocene [22], tipping elements [23], planetary boundaries framework [24], such as planetary thresholds and state shifts [25], etc. Taken together, they create powerful ways to better understand and project the future trajectory of the Earth system.

Why Statistical Physics?
Yet, despite the rapid development of the ESS, the nature and statistical properties of extreme climate events and earthquakes remain elusive and debated, and furthermore, the existence of early warning signals of these phenomena is still a major open question. As a consequence, in this review, we will present several novel approaches based on or stemmed from statistical physics, that could enhance our understanding of how the Earth system could evolve. In particular, the interdisciplinary perspective on statistical physics and the Earth system yields to an improvement of the prediction skill of high-impact disruptive events within the climate and earthquake systems.
Statistical Physics is a branch of physics that draws heavily on the laws of probability and the statistics of many interacting components. It can describe a wide variety of systems with an inherently stochastic nature, aiming to predict and explain measurable properties and behaviours of macroscopic systems. It has been applied to many problems including fields of physics, biology, chemistry, engineering and also social sciences. Note that statistical physics does not focus on the dynamic of every individual particle but on the macroscopic behaviour of a large number of particles. The basic theory and ideas of statistical physics are depicted in many textbooks, such as [26][27][28].
Sethna motivated the relationship between statistical physics and complex systems in his book [29] as follows: "Many systems in nature are far too complex to analyze directly. Solving for the motion of all the atoms in a block of ice âĂŞ or the boulders in an earthquake fault, or the nodes on the Internet âĂŞ is simply infeasible. Despite this, such systems often show simple, striking behavior. We use statistical mechanics to explain the simple behavior of complex systems." A complex system is usually defined as a system composed of many components which interact with each other. Regarding the large variety of components and interrelations, the Earth system thus can be interpreted as an evolving complex system. The concepts and methods of statistical physics can infiltrate into ESS, in particular, (i) critical phenomena  are analogous to tipping elements: a system will collapse and follow a breakdown [30], if it is close to a phase transition or tipping point. Critical phenomena and transitions exist widely in the Earth system [31], such as in atmospheric precipitation [32] and percolation phase transition in sea ice [33]. (ii) The concept of fractal was used to describe the Earth's relief, shape, coastlines and islands [34]. (iii) The earthquake process is regarded as a complex spatio-temporal phenomenon, and has been viewed as a self-organised criticality (SOC) paradigm [35]. Moreover, the seismic activity exhibits scaling properties in both temporal and spatial dimensions [36].
Complex network theory provides a powerful tool to study the structure, dynamics and function of complex systems [37][38][39][40]. Meanwhile, statistical physics is a fundamental framework and has brought theoretical insights for understanding many properties of complex networks. From an applied perspective, statistical physics has led to the definition of null models for real-world networks that reproduce global and local features [41]. Complex network theory is an emerging multidisciplinary discipline that has been applied to many fields including mathematics, physics, biology, computer science, sociology, epidemiology and others [42]. Ideas from network science have also been successfully applied to the climate system and revealed interesting mechanisms underlying its functions, and leading to the emergence of a new concept, Climate Network (CN) [43]. The nodes in a CN represent the available, geographically localized, time series, in particular, regular latituted-longitude grids, while the level of similarity and causality between the nonlinear climate records of different grid points represents the CN's links [44,45]. The development of this data-based approach is providing radical new ways to investigate the patterns and the dynamics of climate variability [46].
In this review, we will introduce how to apply statistical physics methods in Earth system science. Particularly, we focus largely on the surface of the Earth system, including the climate system, Earth's relief as well as the earthquakes system, but not including the whole planetary interior.

Outline of the Report
Apart from this introductory preamble, the report is organized along 3 Sections. In the next Section, we start by offering the overall methodology that will accompany the rest of our discussion. Section (2) contains several topics that are essential but rather formal, including the CN approach, percolation theory, tipping points analysis, entropy theory and complexity. The reader will find there the attempt to define an overall theoretical framework encompassing the different situations and systems that will be later extensively treated. Section (3) gives a comprehensive review of applications, especially those that were studied in climate systems, the Earth geometric surface relief and earthquake systems.
Finally, Section (4) presents our conclusive remarks and perspective ideas.

Climate Networks
For more than two decades, the complex network paradigm has demonstrated its great potential as a versatile tool for exploring dynamical and structural properties of complex systems, from a wide variety of disciplines in physics, biology, social science, economics, and many other fields [37,40,[47][48][49][50][51][52]. In the context of network theory, a complex network is a graph with non-trivial topological features which do not occur in simple networks such as regular lattices (Fig. 2a) or random Erdős-Rényi graphs (Fig. 2b). One of the novelties of complex network theory is that it can relate the topological characteristics to the function and dynamics of the system. Complex network theory has been successfully applied to many real world systems and revealed fundamental mechanisms underlying their functions. It has been also found that many real world networks are scale-free networks (Fig. 2c) [38,53].
In recent years, the ideas of network theory have also been implemented in climate sciences to construct CN [43,44,54]. In CNs the geographical locations (or grid points) are regarded as the nodes of the networks and the level of similarity (causality) between the records (time series) of two grid points represents the links and their strength. Various climate data such as temperature, pressure, wind and precipitation can be used to construct a network. The climate networks approach allows to study the interrelationship between the different locations on the globe and thus represent the global behavior of the climate system, e.g., how energy and matter are transferred from on location to another. These networks have been used successfully to analyze, model, understand, and even predict various climate phenomena [44][45][46][55][56][57][58][59][60][61][62][63][64][65][66][67][68][69][70]. A visualization of a climate network is shown in Fig. 3 as an example.
We first provide an introduction of definitions, notations and basic quantities used to describe the topology of a network, and then present an overview on how to construct a climate network.

Network Characteristics
In this Section we review basic characteristics and metrics used to describe and analyze networks, most of them come from graph theory [72]. Graph theory is a large field containing many branches but we present only a small fraction of those results here, focusing on the ones most relevant to the study of the complex Earth system. A network, also called a graph in the mathematical literature, is a collection of vertices connected by edges. Vertices and edges are also called nodes and links in computer science, sites and bonds in physics, and actors and ties in sociology [40]. Throughout this review we will denote the number of vertices by N and the number of edges by M , which is also a common notation in the mathematical and physics literature.

The Adjacency Matrix
Most of the networks we will study in this review have at most a single edge between any pair of vertices, i.e., self-edges or self-loops not allowed. A simple representation of a network for many purposes is the adjacency matrix A, with elements A ij such that A ij = 1 if there is an edge between vertices i and j, 0 otherwise.
We should notice that the diagonal matrix elements are all zero in the adjacency matrix. In some situations it is useful to represent edges as having a strength, weight, or value to them, usually a real number. Such weighted or valued networks can be represented by giving the elements of the adjacency matrix values equal to the weights of the corresponding connections. A directed network or directed graph, also called a digraph for short, is a network in which each edge has a direction, pointing from one vertex to another. Such edges are themselves called directed edges. For this case, the adjacency matrix is usually not symmetric, and becomes

Degree
The degree of a vertex in a network is the number of edges connected to it. We denote the degree of vertex i by k i . For an undirected network of N vertices the degree can be written in terms of the adjacency matrix (Eq. 3) as The number of edges M is equal to the sum of the degrees of all the vertices divided by 2, so The mean degree k of the node in an undirected graph is k = 1 N N i=1 k i = 2M/N . The concept of degree is more complicated in directed networks. In a directed network each vertex or node has two degrees. The in-degree k in i is the number of in-going edges connected to a node and the out-degree k out j is the number of out-going edges. From the adjacency matrix of a directed network Eq. 3, they can be written Bearing in mind that the number of edges in a directed network is equal to the total number of in-going ends of edges at all vertices, or equivalently to the total number of out-going ends of edges.

Degree Distributions
One of the most fundamental properties of a network that can be measured directly is the degree distribution, or the fraction P (k) of nodes having k connections (degree k). A well-known result for the Erdős-Rényi [73] network is that its degree distribution follows a Poissonian, where z = k is the average degree. As shown in Fig. 2e, despite the fact that the position of the edges is random, a typical random graph is rather homogeneous, the maximum number of the nodes having the same number of edges. However, direct measurements of the degree distribution for real networks, such as the Internet [74], WWW [38], email network [75], metabolic networks [76], airline networks [77], neuronal networks [78], and many more, show that the Poisson law does not apply. But they exhibit an approximate power-law degree distribution where C is a normalization factor. The constant λ is known as the exponent of the power law. Values in the range 2 ≤ λ ≤ 3 are typical, although values slightly outside this range are possible and are observed occasionally [79]. Networks with power-law degree distributions are called scale-free networks. The simplest strategy to determine the scale-free properties is to look at a histogram of the degree distribution on a log-log plot, as we did in Fig. 2f, to see if we have a straight line. Several models have been proposed for the evolution of scale-free networks, each of which may lead to a different ensemble. The first proposal was the preferential attachment model of Barabási and Albert, which is known as the Barabási-Albert model [38]. Several variants of this model have been suggested, see, e.g., in Ref. [80].

Clustering Coefficient
The extent to which nodes cluster together on very short scales in a network is measured by the clustering coefficient. The definition of clustering is related to the number of triangles in the network. The clustering is high if two nodes sharing a neighbor have a high probability of being connected to each other. The most common way of defining the clustering coefficient is: Here a "connected triple" means three nodes uvw with edges (u, v) and (v, w). The factor of three in the numerator arises because each triangle is counted three times when the connected triples in the network are counted.
We can also get a local clustering coefficient C i for a single vertex by defining define C i = the number of triangles connected to vertex i the number of triples centered on vertex i .
That is, to calculate C i , we go through all distinct pairs of vertices that are neighbors of i, count the number of such pairs that are connected to each other, and divide by the total number of pairs. C i represents the average probability that a pair of i's friends are friends of one another. For vertices with degree 0 or 1, for which both numerator and denominator are zero, we assume C i = 0. Then the clustering coefficient for the whole network [37] is the average C = 1 N i C i . In both cases, the clustering is in the range 0 ≤ C ≤ 1.

Subgraphs
A graph G 1 consisting of a set P 1 of nodes and a set E 1 of edges is a subgraph of a graph G = {P, E} if all nodes in P 1 are also nodes of P and all edges in E 1 are also edges of E. The simplest examples of subgraphs are cycles, trees, and complete subgraphs [81]. A cycle of order k is a closed loop of k edges such that every two consecutive edges and only those have a common node. That is, graphically a triangle is a cycle of order 3, while a rectangle is a cycle of order 4. The average degree of a cycle is equal to 2, since every node has two edges. A tree, as shown in Fig. 4a, is a connected, undirected network that contains no closed loops. Here, by "connected" we mean that every node in the network is reachable from every other via some path through the network. A river network is an example of a naturally occurring tree with directed links [82]. The most important property of trees is that, since they have no closed loops, there is exactly one path between any pair of nodes. Thus the number of edges in a tree is always exactly k − 1 edges. Complete subgraphs of order k, also called k-clique, contain k nodes and all the possible k(k − 1)/2 edges-in other words, they are completely connected [see Fig. 4b].
Let us consider the ER model, in which we start from N isolated nodes, then connect every pair of nodes with a probability p. Most generally, we can ask whether there is a critical probability that marks the appearance of arbitrary subgraphs consisting of k nodes and l edges. A few important special cases that directly give an answer to this question [81] are: (i) The threshold probability of having a tree of order k is p c (N ) = cN −k/(k−1) ; (ii) The threshold probability of having a cycle of order k is p c (N ) = cN −1 ; (iii) The critical probability of having a complete subgraph of order k is p c (N ) = cN −2/(k−1) .
Generally, a network is always composed of many separate subgraphs or components, i.e., groups of nodes connected internally, but disconnected from others. In each such component there exists a path between any two nodes, but there is no path between nodes in different components. A network component with size proportional to that of the entire network, N , is called a giant component.
Similar to component, the concept of community is also one of the most fundamental properties in complex networks [83]. The nodes of the network might be joined together into tightly connected groups, whereas between them there are still links but less connections. A single node of the network may belong to more than one community, and most of the actual networks are made of highly overlapping communities of nodes [84]. Since community structures are quite common in real networks, community finding and detection are thus of great importance for better understanding the function of a network. An example of overlapping k-clique communities is shown in Fig. 4c. A good review on the community detection in graphs can be found in Ref. [85]. Network Models Erdős-Rényi model : It can be interpreted by the following two well-studied graph ensembles, G N,M the ensemble of all graphs having N vertices and M edges, and G N,p the ensemble consisting of graphs with N vertices, where each possible edge is realized/connected with probability p. These two families, initially studied by Erdős and Rényi [73], are known to be similar if M = N 2 p. They are referred to as Erdős-Rényi (ER) model. These descriptions are quite similar to the microcanonical and canonical ensembles studied in statistical physics [86]. The ER model has traditionally been the dominant subject of study in the field of random graphs [81], with Poissonian degree distributions, see Eq. (7). Barabási-Albert model : It is based on two simple assumptions regarding network evolution [38]. (i) Growth: Starting with a small number (m 0 ) of nodes, at every time step, a new node with m ( m 0 ) edges that link the new node to m different nodes already present in the system is added. (ii) Preferential attachment: This is the heart of the model. When choosing the nodes to which the new node will be connected, the probability Π that the new node will be connected to node i depends on the degree k i of node i, such that Π (k i ) = ki j kj . In this process, after t time steps this results in a network with N = t + m 0 nodes and mt edges. Theoretical and numerical results show that this network model evolves into a power-law degree distribution, see Eq. (8).
Watts-Strogatz model : In 1998, Watts and Strogatz [37] proposed a one-parameter model that interpolates between a regular lattice and a random graph. The details behind the model are the following: (i) Start with order : Start with a ring lattice with N nodes in which every node is connected to its closest k neighbors. In order to have a sparse but connected network at all times, consider N k ln(N ) 1. (ii) Randomize: Randomly rewire each edge of the lattice with probability p such that self-connections and duplicate edges are excluded. This process introduces pN K/2 long-range edges. The resulting network properties of this model are smallworld and high clustering coefficient. Specifically, a small-world network is referring to a network where the characteristic path length L grows proportionally to the logarithm of the number of nodes, L ∝ log N [37].

Pearson Correlation Climate Network
Our climate system is made up of an enormous number of nonlinear sub-systems having mutual nonlinear interactions and feedback loops active on a wide-range of temporal and spatial scales. Therefore, modeling the Earth climate system from the point of view of complex networks can clearly provide critical insights into the underlying dynamics of the evolving climate system. As discussed above, in CNs, geographical regions of Earth are regarded as nodes, and the bivariate statistical analysis of similarities between pairs of climatological variables time series represent the links.
In general, there are five steps for the CNs construction and analysis [46], the procedure is displayed in Fig.  5, adapted from [87].
Step (i): Nodes, the nodes in CNs are usually defined as locations in a longitude-latitude spatial grid at various resolutions. (ii): Climatological variable, we select the suitable climatological time series to be analyzed, e.g., surface air temperature, sea surface temperature, precipitation, wind, etc. Some pre-processing is also often needed. For example, to avoid the strong effect of seasonality, we usually subtract the mean seasonal cycle and divide by the seasonal standard deviations for each grid point time series. (iii): Edges, in this step we compute the statistical similarity that quantifies the interdependencies between pairs of time series. The strength of each edge is correlation based. There are many measures for quantifying the interdependencies of time series, here we mention the types of CNs by correlation measures. For example, a Pearson correlation climate network, where we use Pearson correlation to quantify the cross-correlations between time series; while an event synchronization climate network uses the event synchronization method; and a mutual information climate network is based on mutual information. (iv): Construction, in this step we construct the CN which typically involves some thresholding criterion to select only statistically significant edges. Considering, for example, that the significant links have W ij values above a given threshold, W c , then the adjacency matrix is determined as where H is the Heaviside function. We then investigate the obtained network by using various network characteristics. Finally, in step (v): Climatological interpretation, the results of this analysis are interpreted in terms of dynamical processes of the climate system (e.g., atmospheric circulation, ocean currents, plenty of waves, etc.). In the following, we will demonstrate how to construct a Pearson correlation climate network. Suppose that the climatic variable is the daily surface air temperature, either from observations, proxy reconstructions, reanalyses, or simulations, gathered at static measurement stations, or provided at grid cells. At each node i of the network, we calculate the daily atmospheric temperature anomalies T i (t) (actual temperature value minus the climatological average which is then divided by the climatological standard deviation) for each calendar day.
For obtaining the time evolution of the weight of the link between a pair of nodes i and j, we follow [66] and compute, for each time windowing (such as month or year) t over the whole time span, the time-delayed cross-correlation function defined as and where the brackets denote an average over the past T days, according to Here, τ is the time lag spanning from [0, τ max ] days. The reliable estimate of the background noise level, i.e., the values of the τ max was discussed in [88,89]. Based on the correlation functions, Eqs. (12)and (13), a weighted and directed link between nodes i and j was defined in Refs. [59,61,64]. This is done by identifying the value of the highest peak (or lowest valley) of the cross-correlation function and denote the corresponding time lag of this peak (valley) as θ i,j . The sign of θ i,j indicates the direction of each link; i.e., when the time lag is positive (θ i,j > 0), the direction of the link is from j to i [64]. The positive and negative links and their weights are determined via C i,j (τ ), and are defined as and where "max" and "min" are the maximum and minimum values of the cross-correlation function, "mean" and "std" are the mean and standard deviation. Typical time series and their cross-correlation functions are shown in Fig. 6. For demonstration, two nodes are selected, a node i is located in the Southwest Atlantic and a node j is in the South American continent (Fig. 6a). The near surface daily air temperature anomalies are shown in Fig. 6b for the time period between 2014 to 2018. The cross-correlation function between the time series is shown in Fig. 6c, where the absolute value of a maximum cross-correlation function is much larger than the level of noise and the minimum value. Since θ i,j = 2 > 0, then the direction of the link is from j to i. According to Eq. (15), we calculate its weight to be W + i,j = 5.71. This value represents 5.71 standard deviations above the noise level, i.e., highly significant.
Finally, networks can be constructed by establishing links between pairs of nodes with weights larger than some significant threshold. The advantage of this method is that it overcomes the problem of strong auto-correlation values in the data [88]. . The direction of this link is from j to i with weight W + i,j = 5.71.

Event Synchronization Climate Network
The event synchronization method provides an alternative way to construct a network from climate observations: event synchronization CN. It was originally proposed by Quian Quiroga et al. [90], where they measured synchronization and inferred time delays between signals in neuroscience. Event synchronization is based on the relative timings of events, such as rainfall, in a time series and is defined, e.g., by the crossing of a threshold or by local maxima or minima, etc. For instance, an extreme rainfall event is defined as the day where its precipitation is above the 99th percentiles of all days. Event synchronization is especially appropriate for studying extreme events. The degree of synchronization is obtained from the number of quasi-simultaneous events and the delay is calculated from the precedence of events in one time series (signal) with respect to the other. Event synchronization starts by constructing two event series from two discrete univariate time series, X and Y . An event l that occurs at X at time t x l is considered to be synchronized with an event m that occurs at Y at time t y m within a time lag ±τ xy Here, l = 1, 2, . . . , e x , and m = 1, 2, . . . , e y . e x and e y are the number of events in the X and Y respectively. The number of times an event appears in X shortly after it occurs in Y is counted: and analogously to Eq. (18) we can define c(x|y). Finally, the symmetrical and anti-symmetrical combinations are used to measure the synchronization of the events and their delay behavior, respectively. They are normalized to 0 Q xy 1 and −1 q xy 1. It should be noted that if several extreme events are very close in one record, then only the first one is considered. Particularly, Q xy = 1 if and only if the events of the signals are fully synchronized. In addition, if the events in X always precede those in Y , then we get q xy = 1.

Mutual Information Climate Network
The mutual information method is also one of the common tools for quantifying the interdependencies between time series and for constructing climate networks. The mutual information of two random variables is a measure of the mutual dependence between the two variables. The concept of mutual information is intricately linked to that of entropy of a random variable [97]. Let X and Y be a pair of random variables with values over the space X × Y. If their joint distribution is p(x, y), and their associated marginal probability distribution functions are p(x) and p(y), then the mutual information is defined as States with zero probability of occurrence are ignored. Intuitively, M I is a measure of the inherent dependence expressed in the joint distribution of X and Y relative to their corresponding joint distribution under the assumption of independence. Mutual information, therefore, measures dependence and nonlinearity, in fact: M I = 0 if and only if X and Y are independent random variables, i.e., p(x, y) = p(x)p(y). Moreover, mutual information is non-negative, symmetric and can also be computed with a time lag [98]. After we get the mutual information coefficient, Eq. (21), for each pair of nodes, we can construct a mutual information climate network. It has been shown that the mutual information CN can well capture the westward propagation of sea surface temperature (SST) anomalies that occur in the Atlantic multidecadal oscillation [99].
To summarize, in this chapter we have described different network characteristics and presented various linear and nonlinear tools of time series analysis, which can be used to construct, define and characterize CNs. Linear and nonlinear methods include Pearson correlation, event synchronization, and informationtheory measures such as entropy and mutual information. There are also some other powerful tools, such as, spectral analysis, empirical orthogonal function analysis and symbolic ordinal analysis that can be used to reconstruct CNs [46].
It is well known that in lattices and other ordered networks [106,107,132], for dimensions greater than one, a percolation phase transition occurs. The percolation process is a simple model in which the nodes (sites) or edges (bonds) are occupied with some probability p and unoccupied with probability q = 1 − p.
Take a regular lattice as an example (see Fig. 2a). A system is regraded as percolating if there is a path from one side to the other parallel one, passing only through occupied bonds and sites. When such a path exists, the component or cluster of sites that spans the lattice is called the spanning cluster or the infinite percolation cluster. At low concentration p, the sites are either isolated or form small clusters of nearestneighbor sites. Two sites belong to the same cluster if they are connected by a path of nearest-neighbor sites. At large p values, on the other hand, at least one path between opposite sides exist (see Fig. 7). The percolation phase transition occurs at some critical density p c that depends on the type and dimensionality of the lattice. For many complex networks, the notion of side does not exist. However, the ideas and tools of percolation theory can still be applied [51]. The main difference is that the condition for percolation is no longer the existence of spanning cluster, but rather having a cluster containing O(N ) nodes. If such a component exists, we call it giant component, that was discussed in Section 2.1. Indeed, the condition of the existence of a giant component applies also to lattice networks, and therefore it can be used as a more general condition than the spanning property.
There exist two types of percolation, site percolation, where all sites are with probability p occupied and 1 − p empty. In bond percolation, however, it is the bonds which are occupied with probability p and above p c they form a giant component of connected sites. In this review, we will focus on the critical phenomena of percolation near the percolation threshold p c , where for the first time a giant component is formed or the system collapses [133]. These aspects are called critical phenomena, and the fundamental theory to describe them is the scaling theory from phase transitions. The concept of phase transition is usually used to describe transitions between solid, liquid, and gaseous states of matter for thermodynamic physical systems, where an ordered phase (e.g., solid) changes into a disordered phase (e.g., liquid) at some critical temperature T c [26]. Another classical example of a phase transition is the magnetic phase transition, explained by the Ising model, where a spontaneous magnetization m > 0 appears, at low temperature without any external field. While increasing temperature, the spontaneous magnetization decreases and vanishes at T c .
Percolation is indeed a simple geometrical phase transition, where the percolation threshold p c distinguishes a phase of finite clusters (disordered phase) from a phase of an infinite cluster (ordered phase). The occupied probability p of sites or bonds plays the same role as the temperature in the thermal phase transition. They are usually called control parameters in statistical physics.

Percolation Order Parameter
The percolation order parameter P ∞ in the relative size of the infinite cluster which is defined as the fraction of the sites belonging to the infinite cluster. For p < p c , there exist only finite clusters, thus, P ∞ = 0; For the case p > p c , however, P ∞ is analogous to the magnetization below T c and behaves near p c as a power law It describes the order in the percolation system and is therefore called the order parameter. We show in Fig.  8, how P ∞ behaves as a function of p, for both bond and site percolation in the 2D square lattice model, as well as, for site percolation in the ER network, respectively. It is worth noting that we adopt p as the control parameter in Fig. 8, which results P ∞ being a monotonic increasing function. When p is gradually decreased (from 1), it can be regarded as removal (or attack) of nodes or links and the system collapses after removing a fraction of 1 − p c . Thus, 1 − p c can be a measure of the resilience of the system, i.e., how close it is to collapse.

Cluster Size Distribution
The finite components distribution near criticality follows the scaling form [106,107] n s ∼ s −τ e −s/s ξ .
Where s is the component size, and n s represents the number of components of size s. At the percolation threshold s ξ ∼ |p − p c | −σ diverges and the tail of the distribution behaves as a power law with the critical exponent σ.

Average Cluster Size
For any given site, the probability that it belongs to a cluster of size s is sn s . Let us define ρ(p) as the probability that any given site is part of a finite cluster, which is the first moment of the cluster size distribution. Hence, for any given site of any given finite cluster, the probability w s (p) that the cluster is of size s is with ∞ s=1 w s (p) = 1. For any given site of any given finite cluster, the average size s(p) of the cluster is Note that, the average size s(p) , excluding the infinite cluster, diverges near the critical point as,

Correlation Length
The correlation function g(r) is the probability that a site at position r from an occupied site in a finite cluster belongs to the same cluster.
Typically, for large r ≡ |r|, there is an exponential cutoff, i.e., g(r) ∼ e −r/ξ , at the correlation length ξ. The correlation length ξ is defined as ξ 2 = r r 2 g(r) r g(r) . (28) ξ measures the mean distance between two sites on the same finite cluster. When p approaches p c , ξ increases as The quantities P ∞ and s are analogous to the magnetization m and the susceptibility χ in magnetic systems. β, γ and ν are called the critical exponents and describe the critical behavior of the percolation phase transition.

Structural Properties
Next, we will briefly introduce some fundamental measurements that are used to characterize the structural properties of a percolation cluster. For more details, the readers can refer to Ref. [106].

Fractal Dimension
The fractal concept was introduced into the physical world by Mandelbrot [134] and applied to percolation by Stanley [135] to describe the cluster shapes at the percolation threshold p c . The infinite cluster is selfsimilar on all length scales, and can be regarded as a fractal. The fractal dimension d f is defined as how, on average, the mass M of the cluster within a sphere of radius r from a site of the cluster changes with r, For length scales smaller than the correlation length ξ, a fractal structure exists. For length scales larger than ξ, the cluster becomes homogeneous. This can be summarized as The relative size of the giant component, P ∞ can also be expressed as The fractal dimension d f of percolation clusters can be related to the critical exponents β and ν. Substituting Eqs. (22) and (29) into Eq. (32), yields d f = d − β ν . Fractal analysis was also applied in the study of complex networks [136]. Generally, there are two basic methods to calculate the fractal dimensions of a given system, i.e., using either the box counting method [137] or the cluster growing method [138].
The fractal dimension d f can be obtained by a power law fit. (ii) The cluster growing method: Similar to Eq. (30), the dimension d f can be calculated by where M C is the average mass of the cluster, defined as the average number of nodes within linear size in the cluster. Note that, the above methods are difficult to apply directly to networks, since networks are generally not embedded in space. In order to measure the fractal dimension of networks one usually combines the concept of renormalization [136], i.e., for each chemical size l B , boxes are placed until the network is covered. Then each box is replaced by a node (renormalization), and the renormalized nodes are connected if there is at least one link between the no-renormalized boxes. This procedure is repeated and the number of boxes

Graph Dimensions d min and d
The fractal dimension d min has been used to describe the structural properties of the shortest path between two arbitrary sites in the same cluster. Let be the path length, which is often called the "chemical distance" [139], it scales with r as ∼ r dmin , and the number of sites within is where d is often called the "graph dimension" or "chemical" dimension. Combining Eqs. (30), (35) and (36), we obtain the relation between d min , d and d f , d = d f dmin . Besides the fractal dimension exponents d min , d and d f discussed above, there are also some other exponents such as the backbone exponent and the red bonds exponents to describe the fractal dimensions of the substructures composing percolation clusters [26,107].

Scaling Theory
The scaling hypotheses of phase transitions was developed by Kenneth G. Wilson during the last century and honoured by the 1982 Nobel prize. The scaling theory of percolation clusters relates the critical exponents of the percolation transition to the cluster size distribution, n s (p). According to the scaling hypotheses, it is possible to state the following relation for n s (p), where τ is the Fisher exponent [102], and f (x) is scaling function following f (x) = exp(−x), which rapidly decays to zero.
The correlation length ξ is defined as the root mean square distance between occupied sites on the same finite cluster for all clusters, see Eq. (28). For clusters with s sites, the root mean square distance between all pairs of sites on each cluster, is and thus, Close to the percolation threshold p c , R s ∼ s 1/d f , and we obtain from the above equation To calculate the sums in Eq. (40), we transform them into integrals, and obtain which yields the scaling relations between ν, σ and τ where d is the system's dimension. Consider the k-th moment of the cluster size distribution which scales in the critical region as, Next we consider the percolation order parameter P ∞ , which behaves as [106], Combining with Eq. (22), we have, Similarly, we obtain for k = 2, the relation Thus, the critical exponents are not independent of each other but satisfy two sets of scaling and hyperscaling relations. The scaling relations can be easily expressed as by the Eqs. (42), (46) and (47).

Universal Gap Scaling
An interesting property of percolation is universality, which is a fundamental principle of behavior at or near a phase transition critical point. As a result, the critical exponents depend only on the dimensionality of the system, and are independent of the microscopic interaction details of the system. The behavior of a system is characterized by a set of critical exponents, as discussed in the last section. If two systems have the same values of critical exponents, they belong to the same universality class. The universality property is a general feature of phase transitions. A phase transition is also characterized by scaling functions that govern the finite-size behaviours [140]. The concept of finite-size scaling provides a versatile tool to study the percolation transition.
Usually, a lattice or network is expected to undergo a continuous percolation phase transition during a random occupation or failure process [81]. Explosive, hybrid and genuinely discontinuous percolation for processes that are not random but competitive link-addition processes has attracted much attention in recent years [116,[141][142][143][144][145][146][147][148][149][150][151][152]. The description of the explosive percolation model is shown in Fig. 9. Instead of performing a finite-size scaling analysis at the critical phase transition point, the finite-size scaling analysis on the percolation properties and critical scaling of the size of the largest gap in the order parameter is considered. For instance, Fan et al. [153] developed a new universal gap scaling theory and propose six gap critical exponents to describe the universality of percolation phase transitions. This theoretical framework can be applied for both continuous and discontinuous percolation in various lattice and network models by using finite-size scaling functions.

Gap Exponents
Starting with an empty lattice, or network system with N isolated nodes, bonds or links are added randomly or by competitive link-addition processes one by one. The control parameter is denoted as r, which represents the link density r = T /M , with M being the maximal number of edges. The order parameter is the size of the largest cluster, given by the largest connected component in the entire system. During the evolution of the system, we record the size of the largest cluster S(T ) at time step T , and calculate its largest one-step gap ∆ The step with the largest jump defines T c and its relative transition point as r c . Moreover, the percolation strength is defined as the size of the largest connected component at T c , i.e., S c = S(T c ).
Critical phenomena such as percolation exhibit scale-free behaviors that are quantified by scaling relations, see discussions in previous sections. Similarity, the averages∆,r c andS c are anticipated to exhibit the following power-law relations [153], as a function of L 1 , where β 1 , ν 1 and d f 1 are three critical exponents describing the universal class of the percolation, and r c (∞) is the percolation threshold in the thermodynamic infinity limit, L → ∞. Their corresponding fluctuations , are also investigated, where D is the number of independent realizations. We expect that χ ∆ , χ rc and χ Sc decay algebraically with L with the following scaling relations where β 2 , ν 2 and d f 2 constitute another set of critical exponents. The universality class of the percolation is characterized by the new six gap critical exponents. Note that they are highly related to the standard percolation exponents, and not fully independent from each other. The relationships between the gap exponents and the standard percolation critical exponents are derived as [153], In particular, the values of β 1 can imply the order of the percolation. That is, β 1 = 0 indicates the percolation is a first order transition; whereas models with 0 < β 1 < 1 are continuous [120,147]. We present the numerical results for bond percolation on a 2D square lattice in Fig. 10, where we obtain β 1 = β 2 ≈ 0.104, 1/ν 1 = 1/ν 2 ≈ 1/ν, and r c (∞) ≈ 0.5, and d f 1 = d f 2 ≈ 1.895. All results are in agreement with the known theoretical values [103].

Tipping Points Analysis
The notion of tipping point: "little things can make a big difference", was first published by Malcolm Gladwell in his book [154]. It means, at a particular moment in time, a small change can result a large and long-term consequences in a complex system. Tipping points are usually associated with bifurcations [155]. A tipping point is defined as "the moment of critical mass, the threshold, the boiling point". Many complex systems experience sudden shifts in behavior, often referred to as tipping points or critical transitions, ranging from climate [23,[156][157][158][159][160][161], ecosystems [158,[162][163][164][165][166][167], to social science [168,169], financial markets [170,171], medicine [172][173][174] and event macroeconomic agent-based models [175].
Complex systems can shift abruptly from one state to another at tipping points, which may imply growing a threat and risk of abrupt and irreversible changes [161]. It is thus of great practical importance to understand the theoretical mechanisms and predict the tipping phenomena. Although predicting such critical points before they occur is notoriously difficult. Theory proposes the existence of generic early-warning signals (EWS) that may indicate for a wide class of systems if a critical threshold is approaching [176][177][178]. EWS is currently one of the most powerful tools for predicting critical transitions. The tipping point analysis technique provides a vital tool to anticipate, detect and predict tipping points in complex dynamical systems. The methodology usually combines monitoring memory in time series, includes dynamically derived lag-1 autocorrelation [179], power-law scaling exponent of detrended fluctuation analysis (DFA) [180,181], and power-spectrum-based analysis [182].

Basic Concepts
Defining tipping points: Following Ref. [177], a tipping point is the corresponding critical point at which the future state of the system is qualitatively changed. A single control parameter is identified as (ρ), for which there exists a threshold (ρ c ), from which a small perturbation (δρ > 0) leads to a qualitative change (F ) in a crucial feature of the system (F ), after some time (T > 0). The actual change (∆F ) is defined as: In this definition, the critical threshold (ρ c ) is considered as the tipping point, beyond which a qualitative change occurs. Note that such changes may occur immediately or even much later after the cause.
Types of tipping points: The theoretical mechanisms behind tipping phenomena in a complex system can be effectively divided into three distinct categories: bifurcation-induced, noise-induced and rate-dependent tipping [183].
Bifurcation, means that a small change in forcing (ρ) past a critical threshold causes a large, nonlinear change in the system state, thus meeting the tipping point definition in Eq. (59). Given a conceptualized open system [184], where ρ(t) is in general a time-varying input. In the case that ρ is constant, we refer Eq. (60) as the parameterized system with parameter λ and to its stable solution as the quasi-static attractor. If ρ(t) passes through a bifurcation point of the system where a quasi-static attractor loses stability, it is intuitively clear that a system may 'tip' directly as a result of varying that parameter. We present two bifurcation-induced tipping point examples in Fig. 11. Where the system's states are described by and for Fig. 11 a and b, respectively. In general, as a system approaches a bifurcation tipping point, where its current state becomes unstable, it leads to a shift to an alternative attractor.
Noise-induced, noise-induced transitions between existing stable states of a complex system (Fig. 11c), can also be regarded as tipping points [185], however, they do not meet the definition of forced changes, as in Eq. (59). Noise-induced tipping points mean that noisy fluctuations result in the system departing from a neighbourhood of a quasi-static attractor. For example, Pikovsky and Kurths studied the coherence resonance in a noise-driven excitable Fitz Hugh-Nagumo system and uncovered that the effect of coherence resonance is explained by different noise dependencies of the activation and the excursion times [186]. The abrupt warming events during the last ice age, known as DansgaardâĂŞOeschger events, provide a noiseinduced tipping real-world example [185]. In addition, Sutera [187] studied noise-induced tipping points in a simple global zero-dimensional energy balance model with ice-albedo and greenhouse feedback [188]. Their results indicate a characteristic time of 100, 000 years for random transitions between 'warm' and 'cold' climate states, which match very well with the observed data. The noise-induced tipping points approach has also successfully been used for modelling changes in other climate models and phenomena [189]. In contrast to approaching bifurcations, it was found that noise-induced transitions are fundamentally unpredictable and show none of the EWS [177].
Rate-dependent, in which the system fails to track a continuously changing quasi-static attractor. To better understand the phenomenon of rate-dependent tipping, Ashwin et al. introduced a linear model with a tipping radius and discussed three examples where rate-dependent tipping appears [183]. Given a system for F ∈ R n and the parameter ρ has a quasi-static equilibriumF (ρ) with a tipping radius R > 0, then for some initial F 0 with F 0 −F (ρ) < R, the evolution of F with time is given by where M is a fixed stable linear operator (i.e. e M t → 0 as t → ∞ ). ρ(t) is a time-varying parameter, that represents the input to the subsystem. The tipping radius may be related to the basin of attraction boundary for the nonlinear problem. This model shows only rate-dependent tipping-because M is fixed and there is no bifurcation in the system and no noise is present. In particular, the model can be generalized to include M and R that vary with ρ(t). Eq. (63) can be solved with the initial condition F (0) = F 0 to give Note that the rate-dependent tipping was also observed in the zero-dimensional global energy balance model [188]. Besides the above three tipping types, there is potentially another type, a reversible tipping point. In Fig.  11d, we show an example of a reversible tipping point, in which a mono-stable system exhibits non-linear but reversible change [159].

Tipping Elements in the Earth's Climate System
In Ref. [23] Lenton et al. introduced the term tipping elements to describe large-scale components of the Earth system that may pass a tipping point, and assessed where their tipping points lie. They emphasized that the human activities may play a vital role to push the components of the Earth system past their tipping points, resulting in profound impacts on our social and natural systems. They also defined the subset of policy-relevant tipping elements having the following conditions: (i) The first condition is described in Eq. (59); (ii) the second one is related to the "political time horizon" T P ; (iii) the third one is called "ethical time horizon" T E ; (iv) the last one is that a qualitative change should correspondingly be defined in terms of impacts. According to these four aforementioned conditions, we review here some policy-relevant potential tipping elements in the climate system, as shown in Fig. 12. For each tipping element, we consider its (1) feature of the system F (see Eq. (59)) or direction of change, (2) control parameter(s) ρ, (3) critical value(s) ρ c , (4) transition timescale T as well as (5) key impacts.
Arctic Sea-Ice. For both summer and winter Arctic sea-ice, the area coverage is declining and the ice has thinned significantly over a large area [190]. In the IPCC projections with ocean-atmosphere GCMs, half of them will become ice-free in September during this century [191], at a polar temperature −9 • C [192]. The decline in the areal extent can be regarded as the feature of the system, denoted by F ; the local air temperature ∆T air and ocean heat transport can be regarded as control parameters; a numerical value of the critical threshold is still lacking in literature; the transition timescale is about T ∼ 10 years; the declining of the Arctic Sea-Ice can amplify warming and cause ecosystem changes.
Greenland ice sheet. It has been reported that the Greenland ice sheet is melting at an accelerating rate [193], which could add additional 7 meters to sea level over thousands of years if it passes a particular threshold. The decrease of the ice volume can be regarded as F , a feature of system; the local air temperature ∆T air is a control parameter; the critical value ρ c ∼ 3 • ; the transition timescale T can reach about 300 years.
West Antarctic ice sheet. According to the IPCC report [193], the Amundsen Sea embayment of West Antarctica might have already passed a tipping point, i.e., the 'grounding line' (where ice, ocean and bedrock meet) is retreating irreversibly. This could also destabilize the rest of the West Antarctic ice sheet [194]. It has been found, using paleoclimatology data, that such widespread collapse of the West Antarctic ice sheet occurred repeatedly in the past. Similar to the Greenland ice sheet, the decrease of the Ice volume can be regarded as F ; the local air temperature ∆T air is the control parameter; the critical value ρ c ∼ 5 − 8 • ; the transition timescale T can reach 300 years. The collapse of the West Antarctic ice sheet may lead to about 5 meters of sea-level rise on a timescale of centuries to millennia [23].
Atlantic circulation. The Atlantic meridional overturning circulation (AMOC) is one of EarthâĂŹs major ocean circulation systems, redistributing heat and affecting the climate. Research has provided evidence for a weakening of the AMOC by about 3 ± 1 sverdrups (around 15 per cent) 2 since the mid-twentieth century [160]. The AMOC is also considered as one of the main tipping elements of the Earth system [23,156]. The overturning of the Atlantic circulation can be regarded as F ; the additional North Atlantic freshwater input is the control parameter; the critical value ρ c ∼ 0.1 − 0.5 Sv; the transition timescale T can reach 100 years. A slowdown of the AMOC is associated with a southward shift of the tropical rainfall belt by influencing the Intertropical Convergence Zone, and a warming of the Southern Ocean and Antarctica [17].
El NiñoâĂŞSouthern Oscillation (ENSO). ENSO, the interannual fluctuation between anomalous warm and cold conditions in the tropical Pacific, is one of the most influential coupled oceanâĂŞ atmosphere climate phenomena on Earth [195][196][197]. It has been reported that extreme El Niño events are projected to likely increase in frequency in the 21st century [198]. The tipping point behavior of ENSO in a warming world was discussed in Ref. [199]. The amplitude of ENSO is regarded as F ; the zonal mean thermocline depth, thermocline sharpness in the east equatorial Pacific, and the strength of the annual cycle are the control parameters; the transition timescale T can reach 100 years. A stronger El Niño usually causes more extreme events (e.g., floods, droughts, or severe storms), which have serious consequences for economies, societies, agriculture and ecosystems.
Indian summer monsoon. The Indian summer monsoon rainfall (ISMR) has a decisive influence on India's agricultural output and economy. The monsoon season (from June to September) can bring drought and food shortages or severe flooding, depending on how much rain falls. The land-to-ocean pressure gradient drives the monsoon circulation, related to the moisture-advection feedback [200]. The ISMR shows a declining trend since 1953 [201]. It has been reported that under some plausible decadal-scale scenarios of land use, greenhouse gas and aerosol forcing, the Indian summer monsoon switches between two metastable regimes corresponding to the "activeâĂŹâĂŹ and "weakâĂŹâĂŹ monsoon phases [23]. Thus, the Indian summer monsoon can be also regarded as a tipping element. In particular, the ISMR is regarded as F ; the planetary albedo over India is the control parameter; the critical value ρ c = 0.5 Sv; and the transition timescale T is 1 years.
Amazon rainforest. Tropical forests play a vital role in the global carbon cycle [202] and are the home of more than half of the known species worldwide [203]. Deforestation and climate change are destabilizing the tropical forests with annual deforestation rates of around 0.5% since the 1990s, with a strong increase in recent years [204]. An empirical finding suggests that the observed tropical forest fragmentation is near to the critical point in three continents, including the Americas, Africa and AsiaâĂŞAustralia [205]. In particular, we focus on the Amazon rainforest, since it is the worldâĂŹs largest rainforest and is home to one in ten known species [161]. If forests are close to tipping points, the Amazon dieback could release 90 gigatonnes Carbon dioxide. Finding the tipping point of the Amazon rainforest is thus essential for us to stay within the emissions budget. Here the rainforest in the Amazon is regarded as F ; the precipitation and dry season length are the control parameters; the critical value ρ c = 1, 100 mm/year; its transition timescale T is about 50 years.
Besides the seven above mentioned tipping elements, there are also some potential policy-relevant tipping elements in the climate system, such as, Sahara/Sahel and West African monsoon, boreal forest, Antarctic bottom water, tundra, permafrost, marine methane hydrates, ocean anoxia and Arctic ozone. The readers who are interested can find them in Ref. [23]. Strong evidence indicates that "tipping points are under way has mounted in the past decade. Domino effects have also been proposed" [161].

Early Warning of Tipping Points
As discussed above, many complex dynamical systems, in particular climate systems, can have tipping points and imply risks of unwanted collapse. Although predicting such tipping points before they are reached is a big challenge, the existence of generic EWS provide useful indicators for anticipating such critical transitions [176,178]. Hence, if an early warning of a tipping point can be identified, then it could help broader society, scientists and policymakers to perform early actions to reduce system collapsed related damages. Thus, numerous studies have been dedicated to detecting and predicting these critical transitions, often making use of the EWS. In this review, we will highlight the tipping point analysis techniques that are used to anticipate, detect and forecast critical transitions in a dynamical system.
Critical slowing down near tipping points The "critical slowing down" phenomenon has been suggested as indicators of whether a dynamical system is getting close to a critical threshold [206]. This happens, for instance, at the fold bifurcation ( Fig. 11), often associated with tipping points. This indicates that the rate at which a system recovers from small perturbations becomes slow, and the slowness can be inferred from rising "memoryâĂİ in small fluctuations in the state of a system. For example, Fig. 13 demonstrates that the critical slowing down has an indicator that the system has lost resilience and may be tipped more easily into an alternative state (as reflected by lag-1 autocorrelation) [178].
In order to illustrate the relation between the critical slowing down phenomenon, increased autocorrelation and increased variance, here we show a simple example that reveals the underlying mechanism. We consider a simple autoregressive model of first order, wherex is the stable equilibrium of the model, ∆t is the period and λ is the recovery speed. Here y n is the deviation of the state variable x from the equilibrium, ε n is a random number obtained from a standard normal distribution and σ is the standard deviation. If λ and ∆t are independent of y n , Eq. (65) can be written as: The lag-1 autocorrelation α ≡ e λ∆t is zero for white noise and close to one for red noise. The expectation value of a first-order autoregressive process y n+1 = c + αy n + σε n is Let's consider c = 0, then the expectation value equals zero and the variance to be The return speed to equilibrium decreases, when the system is close to the critical point. This implies that λ tends to zero and α approaches to one. According to Eq. (68), the variance tends to infinity. In summary, the critical slowing down leads to an increase in lag-1 autocorrelation (α) and in the resulting pattern of fluctuations (variance). Slowing down causes the intrinsic rates of change to decrease, and thus the state of the system becomes more like its past state, i.e., the autocorrelation increases. The resulting increase in "memory" can be measured in a variety of ways from the frequency spectrum of the system.
Autocorrelation function. The lag-1 autocorrelation function (ACF(1)) indicator is a simple way to provide an EWS for an impending tipping event. For instance, Held and Kleinen have shown that the autocorrelation increases in the vicinity of a bifurcation in a model of the thermohaline circulation [179]; Dakos et al. found that the autocorrelation increases before eight well-known climate transitions in the past data [157].
The coefficient of the correlation between two values in a time series is called the ACF. For example the ACF for a time series y n , see Eq. (66) in the autoregressive model, is given by: Corr(y n , y n−s ), s = 1, 2, ..., where s is the time gap and is called the lag. In particular, a lag-1 autocorrelation is the correlation between values that are one time step apart. More generally, a lag s autocorrelation is the correlation between values that are s time steps apart.
The ACF scaling exponent, γ, is the power-law decay of the autocorrelation function with increasing lag s [207]. Let's denote C(s) as the autocorrelation with lag s of time series, then the scaling is defined as for long-range correlations. Notably, for short-range correlated records, C(s) decays exponentially and only ACF(1) is indicative of a data variability close to a tipping point. Detrended fluctuation analysis. Slowing down causes an increase in memory, which can also be measured using detrended fluctuation analysis (DFA) [180]. See [180] for the detailed method. DFA is often used to detect long-range correlations or the persistence of diverse time series including DNA sequences [208], heart rate [209][210][211], earthquakes [212], and also climate records [213]. If the time series is long-range correlated, the fluctuation function, F (n), increases according to a power-law relation: where n is the window size and α the DFA scaling exponent. Here, is the cumulative sum or the "profile" of a time series x i , and Y z t is the fitting polynomial, z stands for the (polynomial) order of the DFA. The DFA exponent α is calculated as the slope of a linear fit to the log-log graph of F (n) vs. n. It has been found that the DFA exponent in the temporal range 10 ≤ n ≤ 100 is sensitive to changes in a dynamical system, which is similar to ACF(1) [181].
Power spectrum. Recently, Prettyman et al. introduced a novel scaling indicator based on the decay rate of the power spectrum (PS) [214]. PS analysis partitions the amount of variation in a time series into different frequencies. When a system is close to a critical transition, it tends to show spectral reddening, i.e., higher variation at low frequencies [215]. The PS scaling exponent β is calculated by estimating the slope of the power spectrum S(f ) of the data, from the scaling relationship Analytically, the three scaling exponents have the linear relationship: It has been reported that the PS-indicator is a useful technique which behaves similarly to the related ACF(1)-and DFA-indicators. In addition, it also shows signs of providing an EWS for a real geophysical system, tropical cyclones, whereas the ACF(1)-indicator fails [214].
Besides the slower recovery from perturbations, increased autocorrelation and memory [see Eq. (67)], increased variance [measured as standard deviation, see Eq. (68)] is another possible indicator of a critical slowing down as a critical transition is approached. For example, Carpenter and Brock have shown that the variance increases in the vicinity of a bifurcation in a lake model [216].
Flickering before transitions. Another notable EWS is a system's back and forth oscillation between two stable states in the vicinity of a critical transition. This oscillation has been called flickering and was observed on the model of lake eutrophication [217] and trophic cascades [218].
Skewness and Kurtosis. In addition to the aforementioned EWS before a catastrophic bifurcation, two further precursors, observed when approaching a critical transition and thus suggested as EWS, are changes in the skewness and kurtosis of the distribution of states [219][220][221]. While skewness indicates asymmetry in the distribution -with a negative skew indicating a right-sided concentration and a positive skew indicating the opposite. Skewness is the standardized third moment around the mean of a distribution, Similarly, kurtosis is a measure of the "peakedness" of the distribution -with positive kurtosis indicating a peak higher than the one of a normal distribution and negative kurtosis indicating a lower peak. Kurtosis is the standardized fourth moment around the mean of a distribution estimated as,

Precursors of Transitions in Real Systems
In the last section, we first highlighted the theoretical background of tipping points that may occur in non-equilibrium dynamics before critical transitions. Nonetheless, it poses much of a challenge to detect EWS in real complex systems such as, climate, social and ecological systems. Developing reliable predictive systems based on these generic properties can strengthen our capacity to navigate systemic failure and guide us for designing more resilient systems.
We will briefly review emerging precursors of transitions in different real systems, including climate, ecosystems, medicine and finance. An abrupt climate change occurs when the Earth system is forced to cross a threshold to a new climate state [222]. Large, abrupt, and widespread climate changes include the Carboniferous Rainforest Collapse [223], Younger Dryas [224], greenhouseâĂŞicehouse transition [225], Dansgaard-Oeschger events, Heinrich events and PaleoceneâĂŞEocene Thermal Maximum [226]. All these rapid climate changes could be explained as critical transitions [176]. For instance, a significant increase in autocorrelation is regarded as a precursor of eight well-known climate transitions [157]. A flickering phenomenon was found to precede the abrupt end of the Younger Dryas cold period, which can be also regarded as a precursor [227].
In ecology, tipping point analysis has also become a major focus of research. For example, an EWS was found in the destabilization of exploited fish stocks, where harvesting leads to increased fluctuations in fish populations [228]. The emerging of EWS has been demonstrated in lake and marine systems [229,230]. In physiology, abrupt transitions were found in epileptic seizures and asthma attacks [174]. For example, flickering may occur before epileptic seizures. In finance, market dynamics may contain information indicating an abrupt change. For example, increased trade volatility may occur before a main shock [231]. For other precursors of transitions in real systems, the reader is referred to Ref. [176].

Entropy and Complexity 2.4.1. Introduction
Entropy is an important concept arising from statistical mechanics. It is a characteristic that describes the state of a system composed of smaller components, and it has been used to be a general measure of complexity, with widespread applications. In classical thermodynamics, entropy is related to the loss of energy during an irreversible process, developed by Rudolf Clausius in the early 1850s. The thermodynamic entropy S is derived from the heat flow δQ at a fixed temperature T , Note that the entropy of a system is defined only if it is in a thermodynamic equilibrium. The statistical mechanics definition of entropy was developed by Ludwig Boltzmann in 1870s via analyzing the statistical behavior of the microscopic components of a system as Here S is the entropy of the macrostate, k B is the Boltzmann's constant, and Ω standing the total number of possible microstates that yields the macrostate. When viewed in terms of information theory, Claude Shannon developed the very general concept of information entropy, a fundamental cornerstone of information theory, to describe an analogous loss of information [232]. It is a measure of the amount of information that is missing before reception. The definition of the information entropy is, where k S = 1/ log(2) (in bits), p i is the probability of each state. In quantum statistical mechanics, the concept of entropy has been developed by John von Neumann and is generally referred to as "von Neumann entropy". For a quantum-mechanical system described by a density matrix ρ, the von Neumann entropy is, where Tr denotes the trace operator. There exist also many other types of entropy, such as Gibbs, Residual, Approximate, Sinai-Kolmogorov, Sample, Multiscale. Entropy has been proven useful in many real-world systems, including analysis of DNA sequences [233], cosmology and astrophysics [234][235][236], economics [237,238], and climate systems [239][240][241]. Each definition of entropy could give better results for some systems but fails for others. We will discuss in the following that entropy has three related interpretations [29] as a measure (i) irreversible changes, (ii) disorder and (iii) uncertainty.

Entropy as Irreversibility
The idea of irreversibility is central to the understanding of entropy. A process that is not reversible is usually called irreversible. This concept arises in thermodynamics. It has been reported that all complex dynamic natural processes are irreversible [242]. For an isolated system with an irreversible process, the entropy never decreases. This is known as the second law of thermodynamics. In order to better understand the behavior of entropy in an irreversible process, we consider the Carnot heat engine and the corresponding Carnot cycle. As shown in Fig. 14a, given by a piston with pressure P (external), and two heat baths, one at a hot temperature T H and the other at a cold temperature T C with some material (a monatomic ideal gas) inside the piston. During one Carnot cycle, Q H heat flows out of the hot bath, heat Q C flows into the cold bath, resulting in a net work W = Q H − Q C . Carnot imagined a fully reversible heat engine, and the Carnot cycle moves the piston in and out with the following four steps (See, Fig. 14 The gas is connected to the hot bath, and the piston moves outward at a varying pressure and a fixed temperature T H . In this step, heat Q H flows into the piston. (2) (ii → iii): The piston expands at varying pressure P . (3) (iii → iv): The expanded gas is then connected to the cold bath and compressed at a fixed temperature at T C . In this step, the heat Q C flows out. (4) (iv → i): The piston is compressed and returns to the original state. In this step, there is no heat transfer.
The net work W done by the piston is the area inside P-V Loop (integration) in Fig. 14b, According to the ideal gas law, where N is the number of particles, the total energy of the ideal gas inside the piston is given by the equipartition theorem where E stands for the kinetic energy. During the first step, we get Similarly, for the third step, we have For the other two steps, there is no any heat flow, based on The above equation Substituting Eq. (84) into the heat flow, Eqs. (81) and (81), we yield Carnot's fundamental result, Let us define the heat flow ∆E = Q at a fixed temperature T . Then the entropy change ∆S is Note that the above is equivalent to Eq. 74. This means that for a reversible Carnot engine, the entropy flows from the hot bath Q H /T H equals the entropy flows from the piston Q C /T C , i.e., no entropy is created or destroyed. However, for any irreversible real engines, entropy will increase.

Entropy as Disorder
Traditionally, another interpretation of entropy is described as a measurement of the disorder or randomness of a system. In thermodynamics, the entropy of mixing is the increase of entropy when two or more different types of particles are mixed without chemical reactions. Consider that N molecules of an ideal gas are separated in a vessel with two equal volumes V . The total unmixed entropy S un is [29] S un = 2k B log V N/2 /(N/2)! .
Here we assume that the two separated ideal gases have the same masses and the same total energy. Let the partition between the gases be removed and they are allowed to mix. Then the mixed entropy S m becomes According to the Eqs. (87) and (88), we obtain the increased entropy, The above equation means that there will be a gain k B log 2 in entropy per molecule during the mixing process. This is since the temperatures and pressures are equal, and removing the partition of the vessel does not involve any heat transfer, thus mixing of gases (e.g., by diffusion), always results in an increasing entropy. The mixing is spontaneous! The diffusion of initially separated gases results in an increase in entropy. The process has increased the random distribution of molecules. Therefore, it is appropriate to suggest a relationship between entropy and complexity (disorder).

Entropy as Uncertainty
The third interpretation of entropy is as a measure of the uncertainty or ignorance of a complex system. This interpretation is more general and strongly related to the information and memory. In this interpretation, the entropy is not an intrinsic property but representing our knowledge about the system.
Next, we will focus on the non-equilibrium entropy and the information entropy to insulate how entropy can be interpreted as uncertainty. The second law of thermodynamics states that the entropy of an isolated system never decreases, and is unchanged if and only if all processes are reversible. What we are interested in is how to define the entropy for non-equilibrium systems. Generally, for both non-equilibrium and equilibrium systems, we use a probability distribution ρ, to define an ensemble of states. Given a probability distribution of a discrete set of states, the entropy is where p i is the probability of i state. In the case of continuum distributions, the entropy becomes We consider a microcanonical ensemble, in which ρ equil = 1/ (Ω(E)δE), the non-equilibrium of the entropy is shifted from the equilibrium (Eq. (75)), and the entropy is For quantum systems, the non-equilibrium entropy can be described by the density matrix ρ, as in Eq. (77). For non-thermodynamic systems, the temperature variable does not exist. Therefore we do not need to use BoltzmannâĂŹs constant k B , but the constant k S = 1/ log(2). Then Eq. (76) becomes here the entropy is measured in bits. This is called information entropy, which was introduced by Shannon [232]. Information entropy represents an ensemble of possible messages or images, and has important implications in communication technologies (message passing) and computer science (data compression). Note that the non-equilibrium Shannon entropy satisfies the following properties: (1) it is maximum for equal probabilities; (2) it is unaffected by extra states of zero probability; and (3) it changes for conditional probabilities. For more details, we refer the reader to Ref. [29].

Approximate Entropy, Sample Entropy and System Sample Entropy
In the following, we will highlight several types of entropy, including Approximate Entropy (ApEn), Sample entropy (SampEn) and System Sample Entropy (SysSampEn), which have been developed to quantify the complexity in non-linear time-series.

ApEn
The ApEn was introduced by Pincus in 1991 [243] motivated by the exact regularity statistics, Kol-mogorovâĂŞSinai (KS) entropy [244] and the K 2 entropy [245]. It is used as a quantification of regularity in time-series data, in particular for relatively short and noisy data sets. ApEn has a wide range of applications in areas from medical data, such as heart rate [243], to finance system [246], psychology [247] and human factors engineering [248].
Definition of ApEn: Given a time series of data u(1), u(2), . . . , u(N ). There are N raw data equidistant in time. It forms a sequence of vectors Here, m is an integer representing the length of the data. Define the embedding distance d[x(i), x(j)] [249] between vectors x(i) and x(j) as the maximum difference in their respective scalar components. Then use where r is a positive real number which specifies a filtering level. Based on the C m i (r), the ApEn is defined as For fixed m and r, Eq. (95) becomes In particular, given N data points, the statistic formula is implemented One should note that a similar entropy, called Eckmann-Ruelle (E-R) entropy [244], has been defined as, Despite their algorithms being very similar, ApEn(m, r) is not intended to be an approximate value of the E − R entropy. Compared to the K − S, E − R and K 2 entropy, ApEn(m, r) has the following advantages [250]: (i) Less affected by noise; (ii) It is robust to outliers; (iii) Lower computational demand, with good confidence; (iv) ApEn(m, r) is finite for stochastic, noisy deterministic and composite processes; (v) Increasing ApEn(m, r) corresponds to intuitively increasing process complexity. ApEn(m, r) has been applied to classify the electroencephalogram (EEG) in psychiatric diseases, such as schizophrenia [251], epilepsy [252], and addiction [253].

SampEn
Motivated by the concept of ApEn, Richman and Moorman developed the sample entropy (SampEn) to assess the complexity of physiological time-series signals, diagnosing diseased states [254]. SampEn is a modification of ApEn, but has three advantages: (i) SampEn agrees much better than ApEn statistics with the theory for random numbers having known probabilistic character over a broad range of operating conditions; (ii) maintains relative consistency and (iii) has a residual bias for very short record lengths.
Definition of SampEn: Similar to the definition of ApEn: given a time series of N points, {u(j) the maximum difference of their corresponding scalar components. Let B i be the number of vectors x m (j) within r of x m (i) and let A i be the number of vectors x m+1 (j) within r of x m+1 (i). Here r represents the tolerance for accepting matches. It is convenient to set the tolerance as r × SD, the standard deviation of the data set. Then the sample entropy is defined as: It is clear that A will always be smaller or equal to B. Therefore, SampEn will be always either zero or positive. A smaller value of SampEn indicates more self-similarity in the data set or less noise [254]. Note that the parameters N , m, and r must be fixed for each calculation.

SysSampEn
One limitation of the aforementioned entropy is that it can only be applied to univariate or bivariate time series (cross-ApEn) [254]. For a complex system with multivariate time series and spatial-temporal structures, Meng et al. proposed the so called SysSampEn and applied it to study the climate system. Based on the SysSampEn, they could measure the complexity (disorder) of a system composed of temperature anomaly time series and forecast the magnitude of an El Niño event with a prediction horizon of 1 year and high accuracy [241].
Definition of SysSampEn: Given N interdependent time series x α (t) (α = 1, 2, ..., N ; t = 1, 2, ..., l) of length l composing the system: 1. We select sub-records k of length m < l, starting at each q-th data point, i.e., starting at t = k × q + 1 = 0 × q + 1, 1 × q + 1, 2 × q + 1, ..., as long as k × q + m ≤ l. Thus a specific sub-sequence Then we select n subsequences from each time series and construct a set of N × n template vectors from the system, i.e., where σ α and σ β are the standard deviations of the time series x α (t) and x β (t) respectively. γ defines the similarity criterion and is a nonzero constant. 2. To examine the probability that two time series, which are close at m data points, still will be close at the next p data points, we construct analogously another set Θ(m + p, q, n) by selecting sub-records of length m + p. To make the number of template vectors of length m equal to that of length m + p, we choose n ≤ l−m−p q + 1. In order to reduce the parameter degrees of freedom and save calculation time, we take p = q, then n ≤ l−m p . We assume that two template vectors from the set Θ(m + p, q, n) 3. The SysSampEn of the system is defined as where A is the number of close vector pairs from the set Θ(m + p, q, n), B is the number of close vector pairs from the set Θ(m, q, n), and l ef f (n) = n * p + m, is the number of days that is used in the calculation of the SysSampEn.
Note that, when N = 1, p = 1, and l ef f = l, the SysSampEn is equivalent to the classical SampEn [254]. Appropriate parameter values have to be identified since only certain value combinations can be used to estimate a system's complexity with considerable accuracy. In order to achieve this, Meng et al. introduced two tests [241]: Spatial asynchrony test and temporal disorder test. The SysSampEn was calculated for the climate system composed of the near surface air or sea surface temperature anomaly time series in the Niño 3.4 region. It was found that a strong positive correlation between the El Niño magnitudes and the values SysSampEn of its previous calendar year exists (Fig. 15A,B). Fig. 15C demonstrates the calculation of the accuracy level, which allows to choose effective parameter combinations.
In addition, to reveal the mathematical meaning of the SysSampEn, Fig. 16 shows the logistic map as an example of applying the SysSampEn to estimate the system complexity and compare it with the Lyapunov exponents. It is found that higher (lower) values of the SysSampEn are strongly associated with higher (lower) Lyapunov exponents, which indicates that the SysSampEn can well capture the complexity of the system.

Climate System
The Earth's Climate System is a highly complex and interactive system as defined by the Global Atmospheric Research Programme (GARP) of the World Meteorological Organization (WMO) in 1975, as being composed of five major components: the atmosphere, the hydrosphere, the cryosphere, the land surface and the biosphere. And in 1992, the United NationsâĂŹ Framework Convention on Climate Change (FCCC) defined the climate system as 'the totality of the atmosphere, hydrosphere, biosphere and geosphere and their interactionsâĂŹ. Fig. 17 shows a schematic representation of the most important components of the climate system and their potential changes. The atmosphere is the gaseous envelope surrounding the solid planet and provides oxygen to most animal life at the Earth's surface. It is the most unstable and rapidly changing component. Currently, the atmosphere is made up primarily of nitrogen (78.1%) and oxygen (20.9%). There are also a number of trace gases that cause of great concern for the future of the planet, including carbon dioxide (CO 2 ), methane (CH 4 ) and ozone (O 3 ). They are referred to as greenhouse gases (GHGs) and, along with water vapor, provide the Earth with the greenhouse effect. Their effect is to keep our Earth warm, but not too warm. The hydrosphere is the water on a planet, including oceans, seas, rivers, lakes and underground water. The cryosphere is the portion of the Earth's surface where water is in solid form, consisting of land ice (including ice shelves and glaciers), snow and sea ice. It impacts the climate system greatly through its high albedo. Variations in the volume of the ice sheets are regarded as one of the major factors for sea level rise. Land Surface, i.e., the "solid" Earth, creates the distribution of continents, ocean basins, mountain ranges, etc. It influences the transformation of short-wave to long-wave radiation, reflectivity of the Earth's surface, reservoir of dust, transfer of momentum and energy of the land surface. The biosphere is the total sum of all living things on the planet, including the organic cover of the land masses (vegetation, soil) and marine organisms. It plays a vital role in exchanging of carbon between the different reservoirs, such as the concentration of CO 2 in the atmosphere, as well as the balances of other gases [255]. There are numerous interactions between the components, by exchanging mass, heat and momentum. For instance, the ocean-atmosphere interaction is a strongly-coupled system exchanging water vapour and heat through evaporation, among others. Each climate system component operates on a rather broad of characteristic temporal and spatial scales. Note that the climate system itself is often considered as part of the broader Earth System [2].
A fundamental factor in climate science analyses is the climate data, which provides information on changes in the Earth's ecosystem and supplies decision makers with a reliable knowledge base on the climate crisis and its impact. In general, there are two data collection methods to gather quantitative and qualitative climate data (also briefly discussed in Section 1.1 ): observations (instrumental, proxies and reanalysis) and climate models simulations. Observational data of the climate system are based on direct measurements and remote sensing from satellites, radar and other platforms. The first meteorological stations were established in Europe and North America. Global-scale observations in the instrumental era began in the mid-19th century for temperature and other climatic variables. The increase and quality of the network of observations and the technology supporting the collection and storage of data have rapidly evolved form 1950 onwards. Paleoclimate reconstructions extend some proxy records back hundreds to millions of years, including coral records [10,256], tree rings records [13] for the last few millennia, as well as stalagmite δ 18 O time series [12], Central-Mediterranean sediment [11] and ice-core records [257].
Climate studies require data of consistent spatial resolution and accuracy over long time intervals. To satisfy this goal, some numerical weather prediction centers have started to produce so-called reanalyses that are obtained by subtle data assimilation techniques which efficiently combine observational and numerical data. For example, the European Centre for Medium-range Weather Forecasts (ECMWF) [258] provides the following reanalysis datasets: ERA5, ERA-Interim, CERA-SAT, CERA-20C, ERA-20CM, ERA-40, ERA-20C and so on 3 . The NCEP-NCAR Reanalysis I/II, 20th Century Reanalysis and North American Regional Reanalysis (NARR) 4 are produced in collaboration by the U.S. National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR) [259]. The JRA-25 and JRA-55 reanalysis 5 are produced by the Japan Meteorological Agency (JMA) [260,261]. Note that most climatic variables or fields in these reanalysis data agree very well with observed data, such as the geopotential fields over the continents of the Northern Hemisphere. However, substantial differences persist in some fields over the Southern Hemisphere or those that are poorly observed [262][263][264][265][266]. More details are summarized in Ref. [267].
Climate models are considered as the primary tools available for investigating the response of the climate system to various forcings, for making climate predictions on seasonal to decadal time scales and for making projections of future climate over the coming century and beyond. Climate models could be defined as a mathematical representation of the climate system based on physical, chemical and biological principles.
The models used in climate research range from simple energy balance models to complex Earth System Models (ESMs), which can be slow and costly to use, even on high-performance computers, and the results can only be approximations [17]. Here, we provide a brief overview of the basic types of climate models: (i) Energy balance models propose a highly simplified version of the dynamic of the climate system. They are zero-or one-dimensional models describing the surface temperature as a function of the energy balance of the Earth, where the number of dimensions, from zero to three, refers to the number of independent space variables used in the model domain. For instance, the global mean surface temperatureT s can be expressed as: Here R i and R o are the absorbed shortwave radiation and outgoing longwave radiation. C is the heat capacity of Earth system, in units of J · m −2 · K −1 , α is the planetary albedo, m is the transmissivity of the atmosphere, a number less than 1 that represents the greenhouse effect of the Earth's atmosphere, σ ≈ 5.67 × 10 −8 W · m −2 · K −4 is the StefanâĂŞBoltzmann constant, µ is an insolation parameter equal to unity for present-day conditions. (ii) AtmosphereâĂŞOcean General Circulation Models (AOGCMs) are the most comprehensive standard climate models that are assessed in the IPCC Fourth Assessment Report (AR). They are used to understand the dynamics of the physical components of the climate system (such as, atmosphere, ocean, land and sea ice), and for making projections based on future GHGs and aerosol forcing.
(iii) Earth System Models (ESMs) are the current state-of-the-art models, and they expand on AOGCMs by including representation of various biogeochemical cycles such as those involved in the carbon cycle, the sulphur cycle, or ozone. These models provide the most comprehensive tools available for simulating the past and future response of the climate system to external forcing, in which biogeochemical feedbacks play an important role. (iv) Regional Climate Models RCMs are limited-area models with representations of climate processes comparable to those in the atmospheric and land surface components of AOGCMs, though typically run without interactive ocean and sea ice. RCMs are often used to dynamically 'downscaleâĂŹ global model simulations for some particular geographical region to provide more detailed information. In addition, based on the GCMs, the Intermediate Complexity models were developed to investigate the Earth systems on long timescales or at a reduced computational cost [20]. A special application is to achieve reliable predictions of extreme events on a sub-seasonal scale. Taken together, climate models provide a comprehensive view of the variability and long-term changes in the climate systems. However, assessing and forecasting climate extreme events, such as El Niño events and extreme rainfall, still poses challenges. The difficulties are mainly due to the intrinsically rare and disruptive natural conditions. They are strongly influenced by weather patterns, modes of variability, thermodynamic processes and various feedbacks. In the following, we will provide an in-depth and detailed description of how the statistical physics methodologies introduced in Section. 2 can be applied to investigate and (or) predict various climate phenomena, such as El NiñoâĂŞSouthern Oscillation (ENSO), Indian Summer Monsoon (ISM), Extreme Rainfall, Atmospheric Circulation and Atlantic Meridional Overturning Circulation (AMOC).

El NiñoâĂŞSouthern Oscillation
One of the most important climate phenomena on year-to-year time scales is the El Niño Southern Oscillation (ENSO). As mentioned in section 2.3, ENSO can be regarded as a tipping element of the Earth system. El Niño is the warm phase of the ENSO and is associated with a band of warm ocean water that develops in the central and east-central equatorial Pacific, sometimes called El Niño basin [44]. The cool phase of ENSO is called La Niña, with SSTs in the eastern Pacific below average. ENSO significantly impacts Earth's ecosystems and human societies, by influencing temperatures and precipitation around the globe, including the Americas, India and surrounding tropical continents [268][269][270]. These remote effects are known as teleconnections [271].   Fig. 18. If the ONI > +0.5 • C for at least five consecutive months, then the event is defined as an El Niño; whereas, when ONI is < −0.5 • C for at least five consecutive months, the corresponding event is regarded as a La Niña. The ONI is use for the operational definition of the ENSO phase by NOAA 6 . There are also some other indices to measure the El Niño events, such as, Niño 1+2, 3, 3.4, 4 and Trans-Niño indexes 7 . Besides the SST, ENSO also influences other climate variables like atmosperic pressure gradients, upper ocean currents and the thermocline depth [272].
ENSO transition mechanism in brief: In the following, we will describe the physical mechanism of ENSO. Bjerknes first postulated that the oceanâĂŞatmosphere interaction is essential for ENSO and is now referred to as the Bjerknes feedback [273]. This positive feedback mechanism is understood to be a prominent mechanism necessary for the development of ENSO. Consider an initial SST anomaly in the central/eastern tropical Pacific. This anomaly reduces the east-west SST gradient and hence weakens the Walker circulation [274], resulting in a westerly wind anomaly. The westerly wind anomaly, in turn, drives the ocean circulation change (easterly winds shoal the equatorial thermocline and induce upwelling in the eastern Pacific, keeping the east cool) that further reinforces the SST anomaly. As a result of the positive feedback, the tropical Pacific reaches a warm state, i.e., El Niño. Tziperman and Yu pointed out that the westerly wind bursts, while partially stochastic, seem an inherent part of the large-scale deterministic ENSO dynamics [275].
Besides the positive Bjerknes feedback, the oscillatory nature of ENSO requires a negative feedback to turn the coupled ocean-atmosphere system from a warm (El Niño) phase to a cold (La Niña) phase. Different negative feedbacks have been proposed since the 1980s associated with the delayed oscillator, the western Pacific oscillator, the recharge-discharge oscillator, and the advective-reflective oscillator. The delayed oscillator is a mechanism for the oscillatory nature of ENSO, which was proposed by McCreary based on the Rossby wave reflection at the ocean western boundary [276]. Suarez and Schopf introduced the delayed oscillator explaining the oscillatory feature of ENSO to emphasize the delayed effects of oceanic wave reflection at the ocean western boundary [277]. Cane and Zebiak developed the first coupled oceanâĂŞatmosphere model, and used it to predict ENSO [278,279]. The conceptual delayed oscillator model is represented as follows: where T represents the SST anomaly in the equatorial eastern Pacific, A, B, η and ε are constants. The first term on the right hand side of Eq. (102) stands for the Bjerknes positive feedback between the ocean and the atmosphere. The second term represents the delayed negative feedback of wave reflection at the ocean western boundary. The last term is a cubic damping term. It has been found that the delayed oscillator overlooks the role of the oceanâĂŞatmosphere interaction in the western Pacific and also assumes that wave reflection at the ocean eastern boundary is not important [280].
Since the delayed oscillator does not consider the western-Pacific anomaly pattern, Weisberg and Wang [281] developed a western-Pacific oscillator model for ENSO. This model stresses that the equatorial wind anomalies in the far western Pacific play an important role in the evolution of ENSO. The western-Pacific oscillator model is represented as: where T is the SST anomaly in the equatorial eastern Pacific, h is the thermocline-depth anomaly in the off-equatorial western Pacific, and τ 1 and τ 2 are the equatorial zonal wind-stress anomalies in the central Pacific and the western Pacific, respectively. All model parameters are constants. It has been found that the growth and decrease in sea level over the western Pacific Ocean are related to ENSO [282]. Based on these ideas Jin [283,284] developed a recharge-discharge oscillator model for ENSO that is represented by: where T is the SST anomaly in the equatorial eastern Pacific and h is the thermocline-depth anomaly in the equatorial western Pacific. The model parameters C, D, ε, E and R h are constants. In particular, Eqs. (108) and (109) represent the discharge and recharge of tropical Pacific Ocean heat content. Picaut et al. [285] proposed a conceptual model of the advective-reflective oscillator for ENSO. They found that the positive feedback results from ocean zonal currents that advect the western-Pacific warm pool (WPWP) toward the east. There are three negative feedbacks that tend to push the WPWP back to the western Pacific, include: (i) anomalous zonal current associated with wave reflection at the ocean western boundary, (ii) anomalous zonal current associated with wave reflection at the ocean eastern boundary and (iii) mean zonal current converging at the WPWP's eastern edge. Unlike the other three oscillator models, the advective-reflective oscillator model does not have a set of simple and heuristic equations. Instead, they used a linear wind-forced ocean numerical model forced by wind anomalies, which were associated with the zonal current of the first baroclinic Kelvin and first meridional Rossby waves [195].
Moreover, based on the dynamics and thermodynamics of Zebiak and Cane's coupled model, as well as all previous ENSO oscillator models, Wang [286] developed a unified ENSO oscillator ocean-atmosphere model with the hypothesis that ENSO may be a multi-mechanism phenomenon and their relative importance may depend on time. The unified ENSO oscillator model is formulated as: where T, h, τ 1 and τ 2 are variables that stand for the SST anomalies in the equatorial eastern Pacific, the thermocline-depth anomalies in the off-equatorial western Pacific, the zonal wind-stress anomalies in the equatorial central Pacific and the zonal wind-stress anomalies in the equatorial western Pacific, respectively. For a given set of parameters, Eqs. (110)- (113) can oscillate on interannual time scales. El Niño forecasting: Since the 1990s, both dynamical and statistical models have been used to model and forecast El Niño events, providing the society the opportunity to prepare for associated hazards such as heavy rains, floods and droughts [287,288]. Although there are about 17 dynamic and 8 statistical models in CPC/IRI currently providing ENSO forecasts routinely, high prediction skill is generally limited to about six months ahead, see the description in Fig. 19. The reason is the presence of the so-called "spring predictability barrier" (SPB), where errors are greatly amplified due to the coupled feedbacks in the equatorial oceanâĂŞatmosphere system [289,290]. After the spring, the ability of the models to predict successfully becomes increasingly better.
In general, a poor skill in forecasts can be divided into two categories: imperfections in the forecast system (including, error in model parameterization of sub-grid-scale motions, the relative scarcity of points with data, errors in the measurement of the data), and fundamental limits of predictability [272]. The fundamental limits are usually coming from the complex and chaotic nature of a system.
To overcome the limits of the SPB in El Niño forecasting, several CN-based approached were developed. For example, by analyzing the CN, Ludescher et al. found that a large-scale cooperative mode exists one calendar year before the warming event, linking the El Niño region and the rest of the ocean [291]. This mode is probably due to the emergence of El Niño acting as an autonomous component in the CN [61]. All nodes used in the CN are shown in Fig. 20a, where the 14 grid points in the El Niño basin [61] are in red and outside this domain there are 193 grid point in blue. The data is the daily surface air temperature (SAT) from NCEP/NCAR reanalysis I. As described in Section 2, the first step is to remove the seasonal trend by minus climatological average for each calendar day. And then, the links between the El Niño basin and outside (see Fig. 20a) are considered. The strength W i,j (t) of each link is computed according to Eq. (15). The mean strength S(t) of the dynamical links in the CN is obtained by simply averaging over all individual link strengths. Fig. 20b, shows the time evolution of S(t) for the learning (top) and forecasting (bottom) phases. They found that before an El Niño event, S(t) tends to increase. They marked an alarm when S(t) crosses a threshold Θ = 2.82 (see Fig. 20b). The prediction accuracy, through Receiver Operating Characteristic (ROC)-type analysis (explained later), is much better than the 12-months forecasts based on climate models. Based on these findings regarding the temporal evolution of the CN, Ludescher et al. successfully predicted the onset of the 2014âĂŞ2016 El Niño event [292]. Motivated by this work, Meng et al. developed a multidisciplinary approach combining climate, network, and percolation theory to predict the onset of El Niño [65]. Their method can forecast El Niño events 1 year-ahead, with a high prediction accuracy of 70%, and a low false alarm of only 4%. Different from [291], they consider the global CN with initially 726 isolated nodes. Links are added one by one according to the link strength, i.e., first adding the link with the highest weight, and continue selecting links ordered by decreasing weight. During the evolution of the CN, they found that the structure of the network changes violently approximately one year ahead of El Niño events, and thus suggest to use the largest size change of the largest cluster (percolation cluster) ∆ [see Eq. (48)] as a percolation-based precursor, to forecast El Niño events. In addition, based on finite size scaling analysis, they found that the percolation process is discontinuous.
To forecast both the onset and magnitude of El Niño events, another sophisticated network-based approach was developed by Meng et al. [66]. They proposed a new forecasting index based on CN links representing the similarity of low frequency temporal temperature anomaly variations between different sites in the Niño 3.4 region. They found that significant upward trends in the index forecast the onset of El Niño approximately 1 year ahead, and the highest peak since the end of last El Niño forecasts the magnitude of the following event. Their index was also tested on several datasets, including ERA-Interim, NCEP Reanalysis I, PCMDI-AMIP 1.1.3 and ERSST.v5. They defined the degree of coherence/disorder of the Niño 3.4 region as, where N is the number of nodes in the Niño 3.4 region and C (t) i,j (θ) stands for the max value of the timedelayed cross-correlation function, see Eqs. (12) and (13). The forecasting index (FI) was proposed as a function of time (months), where a = 0 indicates that the average of ln(C) includes the current month, while m is the total number of months preceding t since Jan. 1981. We show the time evolution of the FI and ONI in Fig. 21 by using the ERA-Interim dataset. Based on the temporal evolution of the FI, they successfully predicted the onset of the 2018-2019 El Niño event, see the purple star above the purple arrow in Fig. 21b. Note that one can use the peak of the FI to predict the magnitude of the following El Niño event. This peak value is determined from the end of the last event to the start of a new event, which means that we should wait until the new event begins to occur. To further improve the forecasting skill, in particular, the magnitude forecasting with a long lead time. Meng et al. [241] developed the SysSampEn method, see more details in Sec. (2.4.5). It is defined in Eq. (100) and shown in Fig. 15a. The correlation between the magnitude of the El Niño events and the SysSampEn can reach to r = 0.99 [see Fig. 15b]. They further used this correlation to forecast the magnitude of an El Niño with a prediction horizon of 1 year and high accuracy (i.e., Root Mean Square Error = 0.23 • C for the average of the individual datasets forecasts). In particular, for the 2018-2019 El Niño event, they forecasted a weak El Niño with a magnitude of 1.11±0.23 • C, compared to the observed value 0.9 • C.
Recently, Feng et al. combined machine learning techniques and climate networks to apply to El Niño predictions, especially for the occurrence of El Niño events and the development of the NINO 3.4 index over time [293]. In Ref. [294], Petersik and Dijkstra applied Gaussian density neural network and quantile regression neural network ensembles to predict ENSO. Classical machine learning models for prediction usually only predict a point value, e.g., the ONI. In contrast, their models forecast directly a probability distribution for the ONI. In particular, they found that their machine learning models have a high-correlation skill (r = 0.5) for long lead times (12 months ahead) for an evaluation period between 1963 and 2017. In Ref.  one and a half years. A review of ENSO prediction based on modern machine learning and artificial neural networks was reported in Ref. [296]. ENSO remote impacts: teleconnections As introduced above, the ENSO phenomenon mainly lies in the Pacific ocean. However, its effects are felt over a large part of the Earth. These remote effects are usually called teleconnections, emphasizing that changing climatic conditions in one place can affect areas far from the source [271,297]. A teleconnection is usually indicated by the correlation between the values observed at two separate locations or regions. It is related to a pattern of variability, associated with atmospheric wave propagation, the presence of ocean currents, etc. For example, the teleconnections associated with the North Atlantic Oscillation (NAO) and ENSO are main examples of long-range correlations that can be found in the atmosphere. Because of those global teleconnections, El Niño leads to higher precipitation in the central Pacific and dry conditions over Indonesia and Northern Australia, much drier and warmer conditions in Mozambique, while the western USA tends to be wetter. Fig. 22 shows weather patterns statistically associated with El Niño conditions during the boreal winter months December-January-February. The regions that have been shown with some degree of reliability to be affected by warm ENSO phases are highlighted, for example, warm-vs.-cold and wet-vs.-dry anomalies. The effects for cold phases of ENSO, La Niña, are in approximately the same regions, but with the opposite sign.
The basic mechanism of ENSO remote influences are produced in the atmosphere via large-scale atmospheric waves, such as Rossby waves. Rossby waves are fundamental for the understanding of weather systems in the atmosphere and the large-scale circulation in the ocean. They depend fundamentally upon the variation of the Coriolis parameter with latitude. In the so called β-plain approximation, the Coriolis parameter varies linearly with latitude where where φ is the latitude, Ω is the angular speed of the Earth's rotation, and R is the mean radius of the Earth. The wavelength of atmospheric Rossby wave is about 6000 km. The atmospheric wave dynamics are akin to those observed in the ocean for ENSO in that they involve a transfer of atmospheric mass by anomalous winds, which in turn affect pressure and winds. In order to assess if a region has an ENSO connection, Fan et al. applied the Pearson correlation CNs approach to investigate the global remote impacts of El Niño and La Niña [64]. They refer to the Niño 3.4 region as the El Niño Basin (ENB). The CNs were constructed by using only directed links from the ENB to regions outside the ENB ("in"-links) based on the global daily SAT fields of the NCEP/NCAR reanalysis and EAR-Interim datasets. The links' strengths are computed according to Eqs. (12), (13) and (15). They identified the value of the highest peak of the absolute value of the cross-correlation function C y i,j (τ ) and denote the corresponding time lag as θ y i,j . The adjacency matrix of a CN became where H(x) is the Heaviside step function for which H(x ≥ 0) = 1 and H(x < 0) = 0. The "in" and "out" degrees of each node are defined as I y i = j A y j,i , O y i = j A y i,j respectively, quantifying the number of links into a node or out from a node. The total "in" weights for each node outside the ENB using are defined as The values of IN(C y i ) and IN(W y i ) reflect the impacts of the ENB. Specifically, if there are no "in" links for a node, both the "in" degree and "in" weights are zero, indicating no impact of ENB. Based on the ONI, the time is divided into El Niño , La Niña and normal years. The "in"-weighted degree fields for El Niño and La Niña are calculated by taking the average of the same type of events where S = y∈EY(LY) I y i , and "EY" and "LY" refer to the years in which El Niño and La Niña start. The number of nodes with zero in-degree is defined as N y and the average in-weights per node is Fig. 23 (a-d) shows that, the affected regions by El Niño and La Niña, are characterized by relatively high in-weights (using C and W ). For comparison, mean winter (Dec-Feb) temperature anomalies during El Niño and La Niña years are shown in Fig. 23 (e-f). A quantitative analysis of the area that is affected/unaffected during El Niño and La Niña years is shown in Fig. 24, where El Niño and La Niña years are respectively emphasized by the red and blue shading. The network analysis reveals strongly (high C y ) localized (low N y ) impacts of ENSO, i.e., these stronger in-weighted activities are found to be concentrated in very localized areas, whereas a large fraction of the globe is not influenced by the events (low N y means that fewer nodes are influenced).

Diversity of El Niño
It has been reported that there are different types of El Niño events, such as the canonical eastern Pacific (EP) and the Modoki central Pacific (CP) types [298,299]

Indian Summer Monsoon
The Indian Summer Monsoon (ISM) is the most prominent among the world's monsoon systems and has a decisive influence on India's agricultural output and economy. The ISM delivers more than 70% of the country's annual rainfall. The prediction of the ISM timing (onset date and withdrawal date) as well as the ISM rainfall is a vital issue for the Indian subcontinent and strongly affects the gross domestic product of the country, up to 22% of which is determined by agriculture [302]. A slight deviation of the timing manifested as delay (or early arrival) may lead to drastic droughts (floods). Also, swings in the amount of rainfall, even with variations of only 10%, can cause severe flooding or drought, causing damages to infrastructure and loss of crops and livelihoods of the population [303]. The onset of the ISM takes place abruptly and the long lead-time operational forecasts for the onset, withdrawal and rainfall amount have shown little skill in the recent decades.
ISM onset and withdrawal forecasting Several statistical climate models have been developed for ISM onset and withdrawal dates forecasting with different timescales from short-range (up to 4 days), medium-range (4âĂŞ10 days) [304,305] to extended-range (10âĂŞ30 days) and long-range (more than 30 days) forecasts [306]. In particular, the Indian Meteorological Department (IMD) provides a forecast of the monsoon onset 21 days in advance, with an accuracy of ± 4 days. Note that most of the ISM forecasting models are based on the input values, including SST, mean sea level pressure, tropospheric moisture, moist static energy, outgoing longwave radiation, wind fields, etc [307][308][309][310][311]. However, there are still impending challenges in the existing forecasting methods: (i) The false monsoon onsets mostly related to non-monsoonal atmospheric circulation systems [312]; (ii) the limitations in predicting a withdrawal data earlier than 1st September.
To overcome these challenges, Stolbova et al. proposed a novel tipping elements approach to accurately forecasts the ISM onset and withdrawal dates [95]. This approach relies on the combination of two methodologies: the climate network analysis and the nonlinear dynamics. First, the analysis of the network of extreme rainfall events allows to identify and reveal the teleconnection between the most active hubs, the Eastern Ghats (EG) and North Pakistan (NP). Second, the application of the nonlinear dynamics through the analysis of fluctuations allows to detect the critical transitions, which are used to define the tipping elements of the ISM. Later, the EG and NP were chosen as optimal observation locations or reference points (RPs) to predict the ISM monsoon onset and withdrawal. It was discovered that there is a tipping point between these RPs for the SAT and relative humidity, which yields a very successful prediction scheme.
The ISM CN is focused on the monsoon region (62.5âĂŞ97.5 • E, 5.0âĂŞ40.0 • N) with a spatial resolution of 2.5 degrees, which results in 15 × 15 = 225 grid points, see Fig. 25. The near SAT (T ) at 1000 hPa, relative humidity (rh) at 1000 hPa, and wind fields at 700 hPa were used. The variance σ 2 of (T ) and (rh) for each grid point are firstly calculated as [95] where x(t) stands for a time series, w is the length of the time window, d is the number of days before the onset of the ISM, y is a given year, and t * (y) is the onset date for the given year y. The time average value of σ 2 is simply expressed asσ where Y stands for the total number of years. The results are shown in Fig. 25. It has been confirmed that the EG and NP regions experience the highest growth ofσ 2 for both T and rh, when approaching the monsoon onset date. It has been further discovered that there exist two intersections in the RPs for the average time series during the period from 1951 to 2014. At both intersections, it coincides with the mean values of the onset and withdraw as determined by the IMD with a standard deviation of ±5 days for the EG region, see Fig. 26. Thus, this allows to derive a prediction scheme for forecasting the onset and withdrawal of the ISM in the EG, based on the trends of T and rh in the RPs. Next, a linear trend estimation for the two RPs is performed and compared with the trends of the mean time series of the training period (1951)(1952)(1953)(1954)(1955)(1956)(1957)(1958)(1959)(1960)(1961)(1962)(1963)(1964)(1965). The slopes of the trends for the RPs provided an estimation of an early, normal, or late monsoon arrival: a greater than average slope of T will lead to an earlier than usual onset, and vice versa. Trends of rh in the RPs in comparison with the average trends for the training period add up to the predictability of the onset: higher than average values of the relative humidity lead to a late onset, and vice versa.
It has also been found that the ISM onset coincides with the date when T in the EG and in NP become equal (see Fig. 26a). Therefore, for the forecasting of the onset, one should predict when T for the EG will abruptly decrease and intersect T for NP. Finally, the prediction scheme of the withdrawal is based on the symmetry of T changes in NP during the year. The withdrawal is estimated as the intersection of the projected T decrease in NP and the T in the EG during the monsoon season (see Fig. 26c).
The prediction is regarded as successful if the time difference between the predicted onset and the real one is ≤ 7 days for the onset and ≤ 10 days for the withdrawal [95]. The proposed scheme using T results in 74% successful predictions of the onset when made on day 125 of the year (April 10 ). The prediction scheme for the withdrawal succeeds in 84% of the years when made on day 205 (July 25). Based on this prediction scheme, they successfully predicted both the onset and withdrawal since 2016 in the EG region 8 .

ISM rainfall forecasting
The ISM is one of the most prominent monsoon systems. It affects the Indian subcontinent, where it is one of the oldest and most anticipated weather phenomena. Several theories have been proposed to explain the underlying mechanism of the monsoon. For example, the sea breeze theory: because of the differences in the specific heat capacity of land and water, continents heat up faster than seas. Consequently, the air above coastal lands heats up faster than the air above seas. These create areas of low air pressure above coastal lands compared with pressure over the seas, resulting moisture-laden winds to flow from the seas onto the lands. On reaching land, these winds rise because of the geographical relief, cooling adiabatically and leading to orographic rains. However, the understanding and predictability of ISM are still evolving.
The All India Rainfall Index (AIRI) is the total amount of ISM June-to-September rainfall averaged over the entire Indian subcontinent. It is the IMD's primary indicator for monitoring the ISM rainfall [313]. An accurate long-term prediction of the ISMR is crucial for taking timely actions and mitigation activities. Over the last few decades, great efforts have been undertaken to understand the basic physics of the monsoon, and to improve the prediction skill for the AIRI [314][315][316][317][318][319][320]. However, both methods have the challenge of unstable relationships with the predictors over time, as observed during the recent years due to a weakened coupling between the boundary forcing and the Indian monsoon [321,322]. In particular, the forecasting skill (cross correlation) of the official operational forecasts made by IMD statistical models is -0.12 for 1989-2012, for the five ENSEMBLE models' multi-model ensemble is 0.09 and the four APCC/CliPAS models' MME skill is 0.24 after 1989. It was discovered that the recent failure is largely due to the models' inability to capture new predictability sources emerging during the recent global warming [303].
Traditionally, the April-May SST anomaly in the Niño 3 or Niño 3.4 region was used as a predictor for the ISMR prediction. This is since the ISMR variations are primarily driven by the EP-ENSO through tropical teleconnections [314,315]. This inverse relationship between the AIRI and EP-ENSO, however, has weakened in recent decades [270]. It has been found that the CP-ENSO has also distinct teleconnections and affects on the ISMR [323]. Based on this physical mechanism, Wang et al. In order to achieve the goal of a long-term and reliable prediction of the ISM rainfall, Fan et al. constructed a series of dynamical and physical CNs based on the global near SAT field [324]. It was uncovered that some characteristics of the directed and weighted CNs can serve as efficient long-term predictors for ISM rainfall forecasting. The developed prediction method produces a forecast skill of 0.5 with a 5-month lead-time in advance by using the previous calendar year's data. The skill of the forecasting, is comparable to the current state-of-the-art models, however, with quite a short (i.e., within one month) lead-time. The new network-based prediction approach is described as follows.
CN construction: Based on the NCEP/NCAR reanalysis daily SAT anomalies data, a CN is constructed for each calendar year from 1948 to the present. In the work, 726 nodes are selected, such that the globe is covered approximately homogeneously [61]. The link strength and direction are determined based on a similarity measure between the temperature anomaly time series of the nodes [See Eqs. (12)- (16)]. A pair of nodes is defined to be connected only if their link is within the top 5% positive strength, which is also corresponding to a statistical significance of above the 95% confidence level.
Network Predictors Mining: To mine the dynamic network predictors for the AIRI, the cross correlation between the observed AIRI and the time series of network in-degree K y i [see Eq. (6)] for each node i on the globe is calculated. Because the forecasting skill of the AIRI became very poor in the IMD operational forecasts and other models since the year 1989 [303], the record is divided into two consecutive periods, i.e., (i) 1950-1988, as the training period and (ii) 1989-2018, as the retrospective forecast period. For the training period, the correlation coefficients are shown in Fig. 27a. It was found that the in-degree index for some nodes captures well the behavior of the AIRI. Particularly, the node (in the region marked by a yellow box in Fig. 27a) having the maximal correlation r is located in the South Atlantic ocean [(30 • S, 30 • W)]. Fig. 27b shows that the value of this maximal correlation is r = 0.49, with a student t-test significance of p < 0.01. Therefore, the in-degree index of this key node is regarded as the optimized network predictor for the AIRI.
AIRI Forecasting: Next, the forecasting capability of the network predictor during the period 1989-2018 is tested. To obtain the forecasted value of the AIRI for one specific year, the forecasted year's predictor value (i.e., the previous year's network in-degree of the key node) is substituted into the least square linear regression equation. This equation used for prediction is derived using only the past information. The independent forecast skill for the period 1989-2018 becomes then ∼ 0.50, which is significantly higher than the IMD operational forecast skill [see Fig. 27c]. The result indicates the good accuracy of the network-based predictor with the long lead-time of 5 months. Finally, based on 2018's data, the AIRI for 2019 is forecasted as 949.87 mm, a 7% departure percentage compared to 9 the long-term average, which is 887.5 mm. The actual value of the AIRI for 2019 was 968.3 mm.
In the following, a potential physical mechanism is discussed. From a meteorological perspective, the teleconnection between the key node's location in the mid-latitude of the South West Atlantic Ocean and the Indian subcontinent is no coincidence. The orography of the Eastern coast of the South American continent is prone to the formation of an anticyclone at mid-latitude due to the interaction with mid-latitude Westerlies (winds that blow from the west at the surface level between 30 • and 60 • S). Moreover, there is the influence of the South polar jet stream, which appears within the upper air Westerlies. The acceleration/deceleration of the South polar jet stream induces areas of low/high pressure, respectively, which links to the formation of cyclones and anticyclones. Because the South polar jet stream strengthens and weakens seasonally, and from year to year, it synchronizes with the variability in the strength of the circulation centered around the location of the key node. One of the possible data-based teleconnection paths is shown in Fig. 27d. Hence, the key node location is sort of the nexus of the South Atlantic circulation, the south polar jet stream and the main atmospheric circulations patterns over the Indian Ocean. It explains why we observe a significant correlation between the network characteristics of the key node and the AIRI.
Further analyses are performed and reveal that the physical mechanisms are related to (i) the network delayed ENSO teleconnection, i.e., the in-degree index is locally minimized in the previous calendar year of the El Niño onset. (ii) The ENSO-Monsoon teleconnection [270,325,326], i.e., the AIRI anomaly and the Niño 3.4 SST anomaly index are negatively correlated. The spatial correlation maps between the Niño 3.4 SST anomaly index with both the SST as well as the rainfall index for all nodes are presented in Fig. 28. These results indicate that the Niño 3.4 SST anomaly index is significantly anti-correlated with the SST in the equatorial Indian ocean and also heavily influences the rainfall pattern over the Indian land region.
Moreover, the network-based approach can be also successfully applied to forecast the Indian homogeneous region rainfall as a prediction scheme, including two specific homogeneous regions [324]: (1) the Central India Rainfall Index; (2) the East & Northeast India Rainfall Index.
In addition, there is solid evidence which reveals that climate change affects the forecast skills. The considerable warming trend in the last 70 years -the 850 hPa temperature in the key node area has risen 2.5 • C -substantially impacted the key node region. An increase of the in-degree index and a change in the structure of the network itself are observed. Concurrently with these changes, the prediction skill of this forecasting method for the ISM rainfall amount is improving substantially [324].

Extreme Rainfall
Extreme precipitation events have produced more rain and have become more common since the 1950s in many regions of the world. Heavy precipitation may pose threats to our society. The most immediate impact is the threat of flooding. This risk can be heightened in urban and agricultural areas. In addition to flooding, extreme precipitation also increases the risk of landslides. Excessive precipitation can also degrade water quality, harming human health and ecosystems. Thus is is crucial to predict extreme rainfall events and to establish an early warning system for them. However, the analysis of spatial patterns of co-variability of such events is challenging because traditional techniques based on principal component analysis of the covariance matrix only capture the first two statistical moments of the data distribution and are not suitable to analyze the behavior in the tails of the respective distributions [327].
Extreme Rainfall over India  To overcome these challenges, Malik et al. applied the CNs approach to analyze the spatial and temporal patterns of extreme rainfall during the ISM [92,328]. This approach is based on the event synchronization method (see Section. 2.1.3). In Ref. [328], the dataset is a daily precipitation gridded set for 1961âĂŞ2004 developed as part of the project âĂŞ Asian Precipitation Highly Resolved Observational Data Integration Towards the Evaluation of Water Resources 10 . Then they extracted the data for the South Asian region at a 0.5 degree resolution. For calculating the event synchronization, two different thresholds of α = 94th percentile and α = 90th percentile were chosen to obtain the extreme event time series, referring to very heavy and heavy rainfall events, respectively. Next, they computed the strength of synchronization Q ij and the delay behaviour q ij between each pair of the grid sites i and j [see Eq. (20)]. After normalizing it holds, that 0 ≤ Q ij ≤ 1 and −1 ≤ q ij ≤ 1, where Q ij = 1 means complete synchronization, and Q ij = 0 stands for the absence of synchronization; and q ij = 1 means that events at i precede events in j. The CN is then constructed by selecting only pairs of sites that show strong correlations, and the time lags between the events are used to define the direction of the links.
The spatial structures, organization, and scales of the extreme rainfall over India during the ISM period were investigated. Fig. 29 exhibits that the degree centrality [see Eq. (4)] of the CNs reveals well-defined spatial structures in the monsoonal extreme precipitation [92]. Based on the median of the geographical length of the links, calculated using the formula for the spherical Earth projected onto a plane, it has been uncovered that extreme rainfall events are synchronized up to 250 km for most of the region. They further found that such a spatial organization opens the possibility for predictions of a probable spatial extent of monsoonal rainfall activity without delay, e.g., for most of the subcontinent, the accuracy of the prediction is above 70% and even reaches up to 100% in certain places [92].

Extreme Rainfall over South America
This event synchronization CN approach was also applied to predict extreme rainfall events in the Central Andes of South America [94]. Based on the real-time satellite-derived rainfall data, TRMM 3B42V7 [329], Boers et al. were able to predict more than 60% of the rainfall events above the 99th percentile in the Central Andes, and even 90% during El Niño conditions. The interplay of northward migrating frontal systems and a low-level wind channel from the western Amazon to the subtropics are considered as the responsible mechanism.
Extreme rainfall events are defined as times with threshold α = 99% for all Dec-Jan-Feb seasons in the spatial domain (85 • W, 30 • W) and (40 • S, 15 • N), at a resolution of 0.25 • , and 3-hourly temporal resolution for the time period from 1998 to 2012. After the construction of the event synchronization CN (see Section 2.1.3), the network divergence ∆S is defined as where S in and S out stand the in-strength and out-strength respectively at each node. Positive values of ∆S indicate sinks of the network; negative values indicate sources. The strength out of and into a region R is where |R| denotes the number of grid points contained in R. Fig. 30a shows the network divergence ∆S (defined in Eq. (123)), which estimates the dynamics and temporal order of extreme rainfall in South America. The boxes labelled 1 to 7 are used for the tracking of extreme events. It was found that the most pronounced source region of the rainfall network is South Eastern South America (SESA), defined as the box ranging from (60 • W, 53 • W) to (35 • S, 30 • S). Fig.  30b depicts where synchronized extreme events occur within 2 days after extreme events occurred in SESA, measured by S out i (SESA). For comparison, S out i (SESA) is shown in Fig. 30c. This analysis reveals that extreme events in SESA are followed by extreme events along a narrow band following the eastern Andean slopes up to western Bolivia, while they are only preceded by extreme events to the southwest. Based on the propagation of extreme rainfall from SESA to the eastern Central Andes (ECA), see Fig. 30d, a prediction for the extreme rainfall for the ECA within 2 days is made after they occurred in SESA. It should be noted that these extreme rainfall events could not be forecasted by using other methods.
Extreme Rainfall over the Globe The event synchronization CN method was also applied to investigate the global pattern of extremerainfall by Boers et al. [96]. The gauge-calibrated, satellite-derived rainfall dataset TRMM 3B42 V731, with daily temporal resolution, from 50 • N to 50 • S for the period 1998âĂŞ2016 is used. The extreme rainfall event thresholds are 80th, 81th, . . . , 99th of the wet days (>1 mm) for the Jun-Jul-Aug season. Note that consecutive days with rainfall above the threshold are considered as single events and placed on the first day of the occurrence. To study the robustness of the method, Global Precipitation Climatology Project32 and NCEP/NCAR Reanalysis 1 data were also tested.
First, the synchronization of extreme events for each pair of nodes is computed (see Section 2.1.3). Network links are added between two nodes if the corresponding synchronization values are significant, p < 0.005. The global distribution of the spatial distances for which significant synchronizations occur was studied. This distribution decays as a power law p(d) ∼ d −α , with an exponent α very close to 1 for distances d < 2, 500 km, but exhibits a super-power-law behaviour for longer distances. The scale-break indicates that the significant links can be divided into two distinct classes: (i) links associated with regional weather systems with distances up to 2, 500 km, which include mesoscale convective systems and tropical cyclones; and (ii) links associated with global-scale teleconnections. Such teleconnections are generally understood to be caused either by direct signal transport due to large-scale atmospheric circulations or by propagating waves triggered by disturbances of these circulations [96]. In Fig. 31, the teleconnection pattern for southcentral Asia (SCA) for events above the 95th percentile is presented. Pronounced link bundles connecting SCA with eastern Asia, the African tropics, large parts of Europe and the eastern coast of North America, as well as the Southern Hemisphere extratropics are apparent. Further analysis reveals that the Rossby waves can be regarded as the physical mechanism underlying these global teleconnection patterns.

Atmospheric Circulation and Global Warming
EarthâĂŹs atmosphere, made up essentially of the gases that surround our planet, consists of circulation patterns that move air from one location to another and from the surface to top elevations. The atmo-  spheric circulation is controlled by EarthâĂŹs rotation, barometric pressure, topography, ocean currents, and differences in temperature, salinity, etc.
Latitude structure of the circulation Since latitudinal gradients in solar energy input are dominant drivers of the atmospheric circulation, its primary features depend on the latitude. We show an idealised depiction of the large-scale atmospheric circulation on Earth in Fig. 32, which schematizes the main features of the circulation, both in a lati-tudeâĂŞheight plane and in the horizontal, without yet considering longitudinal variations due to continents, oceans, etc. This represents the circulation in an average over all longitudes, known as a zonal average. As we see, there are three large-scale convection cells in both hemisphere.
(i) The Hadley cell is a thermally driven, overturning circulation that tends to rise in the tropics and sink at slightly higher latitudes. Warming from the surface near the equator is transferred upward through a deep layer by convection clouds. The rising air spreads poleward and cools slowly, returning to the surface and then moves toward the equator. Roughly speaking, the Hadley circulation transports heat to about 30 • latitude, and the average effect of the transient weather disturbances transports the heat further poleward.
(ii) The Ferrel cell is also called mid-latitude cell. In the Ferrel cell, air flows poleward and eastward near the surface and equatorward and westward at higher altitudes; this movement is the reverse of the airflow in the Hadley cell. It was the first to account for the westerly winds between latitudes 35 • and 60 • in both hemispheres.
(iii) The Polar cell is the smallest and weakest cell, which extends from between 60 • and 70 • north and south to the poles. Air in these cells sinks over the highest latitudes and flows out towards the lower latitudes at the surface. Note that the idealized picture of the atmospheric circulation based on latitude is a reasonable first approximation, but landâĂŞocean contrasts and other variations in longitude are also important.

Climate Change Effects on Atmospheric Circulations
Climate change strongly affects the mode of atmospheric circulations, especially, the Hadley cell, which plays a pivotal role in the EarthâĂŹs climate by transporting energy and heat poleward [330][331][332]. An analysis of satellite observations indicates a poleward expansion of the Hadley cell by 2 • latitude from 1979 to 2005 [330]. A possible mechanism for the changes in the Hadley cell and its relation to global warming has been reported [331]. Also, a possible mechanism for the changes in the Hadley cellâĂŹs strength and its relation to global warming was developed in [332]. Both aforementioned observations and theories suggest a weakening and poleward expansion of the Hadley cell under global warming.
As is well known, the locations of the subtropical dry zones and the major tropical/subtropical deserts are associated with the subsiding branches of the Hadley cell [333]. Therefore, the poleward expansion of the Hadley cell may result in a drier future in some tropical/subtropical regions [331]. In addition, the poleward migration of the location of the tropical cyclone maximum intensity is also considered as a consequence of the expansion of the Hadley cell [334].
The strength of the Hadley cell is calculated using the observed zonal-mean meridional wind in the stream function Ψ [335], where V is the meridional velocity in pressure coordinates, R is the mean radius of the Earth, and p is the pressure. The operators¯and [ ] stand for temporal and zonal averaging, respectively. When computing the Ψ field, we usually assume Ψ = 0 at the top of the atmosphere and integrating Eq. (126) downward to the surface. The intensity of the Hadley cell is much stronger in winter than in summer. To quantify the edges of the Hadley cell, the maximum of the absolute value of Ψ at 500 hPa is first determined (Ψ 500 ), and then the edges are identified as the first latitude poleward of the maximum at which Ψ 500 becomes zero [331].
Theories of the Hadley cell suggest that the meridional extent should scale with the height of the tropopause H [336], where H was computed from temperature data as the lowest pressure level at which the lapse rate decreases to 2 • C/km [337], ∆ H is the tropospheric mean meridional potential temperature gradient, and Ω is the angular velocity of the earth. From the thermodynamic equation and the mass continuity equation [335], the meridional overturning stream function Ψ of the circulation scales as, where ∆ V is the dry static stability of the tropical troposphere and τ is the characteristic overturning time scale of the circulation.
Although the conventional analysis of satellite observations and the climate change simulations suggests a poleward expansion of the Hadley cell, there are still two challenges. (i) The latitudeâĂŞlongitude structure of this expansion is not fully resolved. (ii) In contrast to models and theoretical considerations, which predict a decreasing intensity, an increasing trend was found in the intensity of the Hadley cell in reanalysis datasets [338]. To overcome these unsolved issues, Fan et al. developed an approach based on network and percolation frameworks to study the impacts of climate changes on the atmospheric circulations [67]. They found an abrupt transition during the evolution of the climate network, indicating a consistent poleward expansion of the largest cluster that corresponds to the tropical area, as well as a weakening of the strength of link in the tropics. This was found in both the reanalysis data and the CMIP5 simulations. The underlying mechanism for the observed expansion of the tropical cluster was linked to the weakening and expansion of the Hadley cell.
In Ref. [67], the CNs were embedded into a two-dimensional lattice, where only nearest neighbor links is considered. The strength of each link are calculated based on Eq. (15), with the month-to-month temperature difference. The links are sorted in decreasing order of strength and then added one by one according to the decreasing strength. More specifically, the nodes that are more similar are connected first. Existing clusters grow when a new link connects one cluster to another one. Then the CN undergoes an abrupt and statistically significant phase transition, i.e., exhibiting a significant discontinuity in the order parameter G 1 , the relative size of the largest cluster. Due to the EarthâĂŹs spherical shape, the largest component in the climate networks is redefined as where φ i is the latitude of node i. The percolation threshold is determined according to Eq. (48), i.e., the step with the largest jump is regarded as the phase transition point. Fig. 33(a) shows the CN component structure in the global map at the percolation threshold. Just before this jump, the CN is characterized by three major communities; the largest one is located in the tropical region; the second and third largest are located in the high latitudes of the southern and northern hemispheres. Fig. 33(b) depicts the relative size of the largest cluster, G 1 , as a function of the link occupation probability r in the evolution of the CN. It has been found that G 1 exhibits an abrupt jump at the percolation threshold r c ≈ 0.53. The probability density function (PDF) of the weight W i,j of links is shown in Fig. 33(d).
To determine the temporal evolution of the largest component G c , and its intensity W c (the weight of the critical link that leads to the largest transition), a sequence of networks based on successive and non-overlapping temporal windows with lengths of 5 years each was constructed. The results suggested that the tropical cluster is expanding poleward, meanwhile, its intensity decreases significantly with time. The robust weakening and poleward expansion of the tropical component have been observed both in the reanalysis data and in 31 CMIP5 models. The networkâĂŞpercolation approach was also used to identify the climate change response, by comparing the topology of the tropical component for the first and last twenty years of the 21st century, i.e., 2080âĂŞ2100 vs. 2006âĂŞ2026. It was found that some regions, for example, northern India, southern Africa, and western Australia have a higher probability to be influenced  by the tropical component, whereas the impacts in other regions, e.g., the Northeast Pacific, will become weaker in the future.

Atlantic Meridional Overturning Circulation
The Atlantic Meridional Overturning Circulation (AMOC) is a crucial part of the climate system in the North Atlantic. It has a major impact on climate because of its redistributing meridional heat transport [339,340]. The AMOC has been identified as one of the important tipping elements of the Earth system [23]. Changes in the AMOC have not only affected the North Atlantic and surrounding landmasses, but have also had global impacts [156]. For example, negative change leads to increasing storminess in southern Europe [341], connects to above average sea-level rise at the US east coast [342], and associates with increasing drought in the Sahel (for both frequency and intensity) [343]. Recently, Caesar et al. provided solid evidence for a weakening of the AMOC by about 3 ± 1 (around 15%) sverdrups since the mid-twentieth century [160]. This weakening is revealed by the consisting of a pattern of cooling in the subpolar Atlantic Ocean and warming in the Gulf Stream region (see Fig. 34). The slowdown of the AMOC was found in both in a high-resolution climate model (CM2.6) in response to increasing atmospheric carbon dioxide concentrations, and in the observed HadISST data since the late 19th century. An improved SST-based AMOC index (the subpolar cold patch) was developed, which is optimized in its regional and seasonal coverage to reconstruct AMOC changes. In particular, it was found that the AMOC decline since the 1950s is very likely to be largely anthropogenic. This slowdown of the AMOC is mainly caused by the gradually varying freshwater forcing in the northern North Atlantic.
The AMOC is the zonally averaged volume transport and its strength at 26 • N in the Atlantic and is now routinely monitored by the RAPID-MOCHA array [344]. Mean patterns of the meridional overturning circulation in the Atlantic are determined from the Community Earth System Model (CESM) simulation [345,346]. The AMOC has its maximum at about 1000 m depth around the separation latitude of the Gulf Stream with a strength of about 20 Sv. Since a future collapse of the AMOC has been identified as a tipping element, it is therefore crucial to develop early warning indicators for such a potential collapse. As we discussed in Section 2.3, various techniques have been developed and successfully applied to detect early warning indicators in single time series used based on the concepts of critical slowing down [177]. In particular, early warning indicators for the tipping point of the AMOC have been discussed in [179,181].
Recently, several CN-based warning-signals have been proposed by analyzing the correlations for the AMOC collapse that efficiently monitor spatial changes in deep ocean circulation. Specifically, the values of the mean degree, its kurtosis, assortativity, and clustering increase when approaching a tipping point [69,70]. The CN methodology is applied to temperature time series from an idealized ocean model of the AMOC, as well as to AMOC strength time series from a coupled atmosphere-ocean GCM. In addition, the optimal locations of measurement of the AMOC are formulated to provide early warning signals of a collapse through the analysis of the performance of this indicator. Next, we will provide an overview of these CN-based early warning indicators for the collapse of the AMOC.
Mheen et al. [69] used a dimensional (meridional-depth) idealized ocean model of the AMOC, where tipping points can be explicitly computed. In this model, there are two active tracers: temperature T and salinity S, which are related to the density ρ by a linear equation of state [347] where α T and α S are the thermal expansion and saline contraction coefficients, respectively, and ρ 0 , T 0 and S 0 are reference quantities. The surface freshwater forcing F S , which can be applied as a virtual salt flux, is where φ indicates latitude, φ N is the northern boundary of the equatorially symmetric domain, γ is the strength of the background freshwater forcing, and η is a white noise term. β is the strength of an anomalous freshwater flux which is only applied over the area [40 • N, 60 • N], and Q is a constant used to normalize the surface-integrated salt flux from 0 to 1 for all parameter values. As shown in Fig. 35, the bifurcation diagram of the AMOC in the idealized ocean model indicates the existence of tipping points at L 1 and L 2 . The model was discretized on a 32 × 16 spatial grid, with four values of β close to L 1 , labeled A, B, C and D. The output of this model is the temperature field time series. Next, an undirected and unweighted CN is constructed based on the Pearson correlation method (see Sec. 2.1) with no time lags and a threshold C c = 0.7 [69], i.e, if the correlation C i,j > C c , the corresponding two nodes i and j are connected. Four network-based indicators E d , E c , K d and K c were developed to study the dynamic evolution of the AMOC. Here E d is defined as the expectation value of the normalized degree distribution d/d max ; E c is defined as the expectation value of the normalized clustering coefficients c/c max ; K d is the kurtosis of the degree distribution; and K c is the kurtosis of the clustering coefficients distribution. These four network indicators are presented in Fig. 36. It was found that E d increases steeply and smoothly when the tipping point L 1 is approached. The explanation for this increase is based on the increased dominance of the first EOF in the variance of the solution in a large part of the domain. The same holds for the indicator E c (Fig. 36g). The kurtosis distributions K d and K c (Fig. 36h) show even more clearly a strong increase when the tipping point L 1 is approached before the collapse. Note that the CNs were constructed from the full spatial temperature field using a sliding window of 5000 years with a shift of 1000 years. The results were not sensitive to the sliding window length. For comparison, previously suggested early warning indicators of the transition in the same AMOC time series, such as, lagâĂŞ1 autocorrelation, DFA, standard deviation (see our discussion in Section 2.3) are also plotted in Fig. 36 and tend to exhibit a rapid increase/decrease when approaching the tipping point. In particular, the coefficient of the lag-1 autocorrelation (Fig. 36c) reaches its maximum near the tipping point indicating a critical slowdown of the AMOC. However, it has been found that the indicator is not monotonic and also increases when the AMOC is still far from the transition. The DFA analysis shown in Fig. 36d does not indicate early warnings of a transition. Fig. 36e depicts the standard deviation of the AMOC record. It shows a steady increase toward the transition but it is difficult to set a threshold for an alarm. In Fig. 36f, the fraction of the time spent below the 0.5 percentile (estimated based on the first 5000 years) of the AMOC strength is shown. This indicator displays a much sharper increase, but it is a rather ad-hoc measure which depends on a calibration in the far past [69].  The same CNs-based analysis was also performed in a more realistic Fast Met Office/UK Universities Simulator (FAMOUS) model by Feng et al. [70]. The FAMOUS model is a reduction for Hadley Centre Coupled Model with a lower resolution ocean and atmosphere component. The annual mean of AMOC strength data were obtained from a control simulation and from a freshwater-perturbed (referred to as "hosing") simulation of the FAMOUS model. In the hosing simulation, the freshwater flux over the extratropical North Atlantic was increased linearly from zero to 1.0 Sv over 2000 years [70]. The complete AMOC streamfunction field for each of the six 100-year equilibrium simulations has been considered. It was found that when the freshwater forcing is increased, a high degree in the network âĂŞ indicating high spatial AMOC correlations âĂŞ first appears at nodes in the South Atlantic at about 1000 m depth. The kurtosis K d of the CN was introduced as an effective indicator to capture the changes in the topology of the degree field. For the hosing simulation, there is a strong increase of K d to values far extending those for the control simulation significantly before the collapse time. However, the critical slowdown indicators variance and lag-1 autocorrelation do not show any early warning signal. Moreover, the kurtosis K d can also be used to determine the optimal observation locations of the AMOC, which provides a strong anomalous signal at least 100 years before the transition.

Earth Geometric Surface Relief
The topography or bathymetry of the Earth shows complex multifractal structures and scaling properties [34,[348][349][350], which can be regarded as a consequence of plate tectonic processes. Revealing the relations between geometrical features of terrestrial surfaces and their internal geological processes has long been a fundamental challenge in the Earth sciences. The plate tectonic theory is usually used to study most of the major surface topographic features of the Earth [351]. It has been reported that there exist strong connections between the ocean bottom topography and the Earth's climate [352]. Moreover, the surface topography of the Earth plays a remarkable role in the dynamical evolution of oceans, especially, in regard to global climate change and sea level rising. Sea level rise is one of the major consequences of global climate warming [353] and the effects in combination with storm surges and other extreme events have been observed [354]. Decreasing global CO2 emissions is crucial for limiting the risks of sea-level rise. It is estimated that the median sea-level rise will be between 0.7 and 1.2 m until the year 2300 even within the constraints of the Paris Agreement [355]. The worldâĂŹs large ice sheets in Greenland and Antarctica are considered to be the largest possible contributors to sea level rise. If these ice sheets melt completely, the sea level would rise about 7 m, 5 m and 53 m from the Greenland ice sheet, the West Antarctic ice sheet, and the East Antarctic ice sheet, respectively. There is thus a great interest in clarifying/predicting the influenced areas in response to sea level rise.
Recently, the ETOPO1 Global Relief Model was released and provided new opportunities for a better understanding of Earth's surface processes based on geomorphic signatures [356]. ETOPO1 is a 1 arc-minute global relief model of the Earth's surface that integrates land topography and ocean bathymetry and was developed by NOAA. It was built from global and regional data sets and used to calculate the volumes of the world's oceans and to derive a hypsographic curve of Earth's surface (see Fig. 37).
In this Section, we will review statistical properties of the Earth geometric surface relief using percolation theory. Their fractal structure and scaling will also be discussed. In particular, the present mean sea level on Earth will be shown to coincide with the critical threshold in a percolation description of the global topography; strong evidence reveals abrupt transitions that occurred during the evolution of the EarthâĂŹs relief network, indicative of a continental/cluster aggregation. This could help us to identify the critical nodes or locations that will be more exposed to global climate change.

Self-similarity and Long-range Correlations
Self-similarity and long-range correlations are remarkable features of the EarthâĂŹs surface topography [34]. Starting about 30 years ago, new ideas in nonlinear dynamics, particularly fractals and scaling, provoked an explosive growth of research both in modeling and in experimentally characterizing the solid earth geophysics including the topography, for a review see Ref. [357]. The power spectrum S of linear transects of Earth's topography follows the scaling relation S(k) ∼ k −βc , where k stands for the wave number [34]. Such a scaling relation was used to measure the self-similarity of the EarthâĂŹs surface topography. Moreover, similar scaling relations have been identified in Earth's sea-floor topography [358], the topology of river networks [359], as well as natural rock surfaces [360].
The long-range correlation features of the EarthâĂŹs surface topography can be described in the following way. The exponent β c in the power-law spectrum is related to the Hurst exponent H in fractional Brownian motion (fBm) via β c = 2H + 1, which results in H 0.5 for the Earth's topography. Whereas further universal multifractal parameters estimated for specific parts have resulted in a much more complex structure, i.e., H = 0.46, 0.66, 0.77 for bathymetry, continents and continental margins, respectively [348].
The third characteristic of the Earth's surface topography is the well known bimodal distribution, as a consequence of plate tectonic processes [361]. It indicates the topographic dichotomy of continents and ocean basins.

Landmass and Oceanic Clusters
Although the classical pate tectonic theory provided a framework that might explain most of the major surface topographic features of the Earth, a percolation-based description as discussed in Sec.(2.2) was developed to study the geometrical features of the Earth from a statistical physics perspective [127,130]. The analysis is based on high-resolution ETOPO1 global relief records. The resolution is 1 arc-minute, i.e., N = 10800 × 21600 grid points. The present mean sea level (zero height) is assumed as a vertical datum of the height relief h(φ i , θ i ), where φ i , θ i are the corresponding latitude and longitude of grid point i.
In Ref. [127], the hypothetical water level was considered as the percolation control parameter. When it is decreased from the highest to lowest available heights on Earth, there occurs a geometrical percolation phase transition at a critical level h c around which most parts of landmass join together. Fig. 38 illustrates the landmass cluster aggregation by decreasing the sea level from h = 100 m (top) to h = −3520 m (bottom). To determine the percolation threshold, the probability of any nodes to be part of the largest island was defined as the order parameter; the mean island size χ (analogous to the susceptibility of the system) was defined as, χ = s s 2 n s (h)/ s sn s (h), where n s (h) denotes the average number of islands of size s at level h and the prime on the sums stands for the exclusion of the largest island; the correlation length ξ is defined as the average distance of nodes belonging to the same island cluster, ξ 2 = s 2R 2 s s 2 n s (h)/ s s 2 n s (h), where R s is the radius of gyration of a given s cluster. It has been found that the order parameter for islands has an abrupt jump around the zero height level h = 0, i.e., right at the present mean sea level (Fig. 39a).  It was furthermore found, that both quantities χ and ξ become divergent at the present mean sea level, see Fig. 39b. From these results, the most remarkable observation is that the critical level h c coincides with the present mean sea level h = 0 on Earth. This may suggest the important role of water on Earth and shed new light on the tectonic plate motion [127].
The same percolation analysis was also performed for the oceanic clusters, i.e., the hypothetical water level was increased from the lowest to highest. Fig. 39b gives rise to a discontinuous jump in the oceanic order parameter at around −3640 m. Further investigation in the lunar topography reveals various characteristic features of the Moon. It was found that the critical level for the Moon has the same amount of land and oceans at the threshold, indicating a purely geometrical phase transition [127].

Origin of the Discontinuity
It has been reported that a random network or lattice system always undergoes a continuous percolation phase transition and shows standard scaling features [81]. Yet, the order of the percolation transition for the Earth's topography is still an open question. To answer this, Fan et al. developed a more sophisticated percolation-based method [130]. The percolation model was defined as follows: starting from an unoccupied lattice, the sites are occupied one by one according to their ranking, i.e., first choosing the site with the largest height, then the second, etc. At each step, the fraction of occupied sites p increases by the inverse of the total number of sites N in the Earth's relief landscape. By this procedure, a configuration of occupied sites is continually obtained at every p. Here all the grid points in the ETOPO1 Global Relief Model were ranked according to their height h(φ i , θ i ), from the largest to the smallest value.
In the following, the size of the percolation landmass (oceanic) clusters s were calculated based on Eq. (129) due to the Earth's spherical nature. The size of the i-th gap g i at each fraction p is defined as follows: Specifically, here g 1 denotes the largest gap, g 2 indicates the second largest gap, etc. The larger the gap g i is, the larger are the two clusters before merging. The dynamical evolution of the largest landmass cluster s as a function of the fraction of occupied nodes p is shown in Fig. 40a. It was found that EarthâĂŹs relief network undergoes several abrupt and statistically significant phase transitions, i.e., exhibiting a discontinuity in the order parameter s. Fig. 40b shows the network clusters landmass structure at the percolation threshold (just before the largest gap g 1 ). The results reveal that the network, just before this jump, is characterized by four major communities: the largest one being the Afro-Eurasia continental landmass, the second largest cluster is the Americas, the third is located in Antarctica, and the fourth is Oceania. Notably, there exists The same analysis has also been applied for the oceanic clusters, i.e., the nodes are occupied one by one according to their hight level in increasing order. It also has given rise to a discontinuous jump in the oceanic order parameter at the percolation threshold p c ≈ 0.379 with an altitude level h = −3621 m. This is very close to the result h = −3640 m reported in Ref. [127]. Note that the critical node, (59.908333 • S, 161.308333 • E), connects the Atlantic+Indian Ocean Plate to the Pacific Plate. It is worth noting that the role of the largest cluster on the landmass structures is different from that on the oceanic one. These differences reveal the complex and different structure of the EarthâĂŹs topography (continents) and bathymetry (oceans). This dichotomy is manifested in the well-known bimodal distribution of the EarthâĂŹs topography [361].
To demonstrate the order of the percolation phase transition in Earth's relief network, a finite-size effects analysis was performed by altering the resolution of the nodes. The largest gap g 1 (L) was calculated for a given system size L. Here, L is defined as the number of nodes in the zonal direction. According to the a b c d Eq. (49), if g 1 (L) → 0 as L → ∞, the corresponding giant cluster is assumed to undergo a continuous percolation; otherwise, the corresponding percolation is assumed to be discontinuous [147,153]. In addition, the size of the order parameter at the percolation threshold [just before the largest jump g 1 ], s c (L), was also studied [see Eq. (51)]. The results are shown in Fig. 41a and b, which indicate a discontinuous percolation since (1) g 1 (L) tends to be a non-zero constant; (2) d f − d = 0, also indicates a discontinuous percolation. For comparison, a randomized topography obtained from the shuffling (spatial randomizing the height profile) of the original data was also considered (see Fig. 41a and b). The numerical results for the shuffled case, suggest a continuous percolation transition with known critical exponents β/ν = 5/48 ≈ 0.104 and d f = 91/48 [106,107], as expected when the long-range correlations were destroyed. In order to reveal the origin of the discontinuity, Fan et al. [130] further studied the percolation on 2D fBm surfaces with Hurst exponent H [362,363]. The parameter H is usually between 0 and 1, where 0 is very noisy, and 1 is smoother. The Earth's rough surfaces can be modeled via a fBm [364], and the estimation for the Earth continents topography is H = 0.66 [348]. Next, the percolation analysis on fBm surfaces with H = 0.66 was performed. Similar to the real network, the largest cluster s also exhibited abrupt transitions around p ≈ 0.3. The largest gap g 1 (H, L) (average) as a function of system size L is shown in Fig. 41c. It was found that the percolation on fBm surfaces with H = 0.66 is discontinuous, since g 1 (H, L) tends to be a non-zero constant when L → ∞. The percolation threshold p c (H, L) corresponding to the largest gap during the evolution of site percolation is shown in Fig. 41d. The values are robust and agree well with the real data where p c ≈ 0.321 [see Fig. 38]. This indicates that the combined percolation and fBm methods can be used as an efficient tool to investigate the Earth's surface topography.

Earthquake Systems
Earthquakes are one of the most devastating natural disasters that plague society. Thus, reliable and skillful earthquake forecasts over different time scales (from days to decades) are essential for establishing rational seismic risk reduction strategies and to enhance preparedness and resilience. The earthquake process is a complex spatio-temporal phenomenon, and has been thought to be an example of self-organised criticality systems, in which events occur as cascades on a wide range of sizes, each determined by fine details of the rupture process. Despite rapid progress in the latter part of the 20th century, the study of earthquakes, like the science of many other complex natural systems, is still in its juvenile stages of exploration and discovery. The scientific challenge is how to understand the earthquake phenomena that are both profound and practical, in particular, the deterministic prediction of specific event sizes, their locations, and times. Recently, probabilistic forecasting, based on statistical patterns of earthquake occurrence, became a much more realistic goal, and has been actively explored and tested in global initiatives [36].
In this section, we will first present a description of the phenomenological laws of earthquake occurrence, and scale invariance and the intertime distribution is also briefly discussed. Then we will review how the statistical mechanics approach (memory analysis) can be applied to improve the forecasting skill for real earthquake catalogs.

Scale Invariance and the inter-event Distribution
First, we introduce some fundamental quantities that characterize earthquake occurrence. This information, such as occurrence time, hypocentral location, earthquake magnitude and depth, is usually reported in seismic catalogs available online for different geographic areas, for instance, the United States Geological Survey (USGS) earthquake catalog 11 . We present here the main phenomenological laws for the statistical distribution of these quantities.
The most common method of describing the size of an earthquake is the Richter magnitude scale m, which takes the logarithm of the ground displacement as measured by a seismograph, and applies a correction which varies with the distance from the earthquake to the seismograph [365]. The characteristic size of the fractured area L, as well as the faulting duration τ , can be both expressed in terms of the magnitude as L ∝ τ ∝ 10 0.5m .
Next, we will give some well established empirical basic laws of earthquakes. In seismology, the Guten-bergâĂŞRichter (GR) law expresses the relationship between the magnitude m and cumulative number of earthquakes N in any given region and time period with magnitude larger or equal to m, which exhibits an exponential decay [366] log N = a − bm, where a and b are fitting parameters. We show the GutenbergâĂŞRichter law for the Israeli earthquake catalog as an example, in Fig. 42. The b-value of the Gutenberg and Richter law has been calculated using the approach described by Marzocchi and Sandri [367]. The value was found to be equal to 0.97 ± 0.02. The intercept of the fit line defines the a-value that explains the earthquake rate for the region selected. In our case a = 5.36. As expected, different catalogs exhibit different seismicity levels quantified by the magnitude range and by the total number of events, corresponding to different values of the constant a. Conversely, the parameter b appears to be quite independent of the geographical region and the temporal interval. Typical experimental results provide b 1 for tectonic earthquakes. The second empirical law, the Omori law, was first described by the seismologist Omori in 1894 [368], where n(t) is the aftershock rate, t is the time elapsed from the mainshock occurrence; K and c are empirical constants controlling, respectively, the total number of aftershocks n(t) and the onset of the power law decay. p is a third constant, which modifies the decay rate and typically falls in the range 0.7âĂŞ1.5. The magnitude of an event understandably influences the number of aftershocks triggered by it, i.e., the larger the mainshock magnitude, the larger is the total number of triggered aftershocks. This property is known as the productivity law, stating that the number of aftershocks n A [integral of Eq. (134)] belonging to a sequence increases exponentially with the mainshock magnitude m M , The parameter α M is typically small for swarm-type activity, and large for clear primary aftershock sequences [369]. The distribution of waiting times between seismic events has generated much attention and discussion over the last decades, since it was considered to have a universal scaling form [370]. Bak et al. proposed, for the first time, a unified scaling law for the waiting times between earthquakes, expressing a hierarchical organization in time, space, and magnitude. They considered the California earthquake catalog and covered the region with a grid of cells of size L × L. Then they analyzed the inter-event times for each cell and calculated a histogram of the inter-event times P S,L whose magnitudes are greater than m = log(S); this was repeated for different values of L and minimum magnitudes (see Fig. 43a). It was found that if they rescaled the inter-event times T by S −b L d f (where b is the Gutenberg-Richter b -value and d f is the spatial fractal dimension), and rescaled P S,L by T α , where α is some exponent, the distribution of waiting times all appeared to collapse onto a single curve (Fig. 43b). The scaling hypothesis was thus expressed as where α ≈ 1 can be identified as the Omori-law exponent for aftershocks, b ≈ 1 is the b value in the Gutenberg-Richter law, and d f ≈ 1.2 describes the 2d fractal dimension of the location of epicenters projected onto the surface of the Earth [370]. Subsequently, Corral [371] has discovered that the probability density function D xy (τ ) of the interoccurrence time τ can be simplified to the scaling form where R xy stands for the mean seismic rate that refers to the (x, y) region. The scaling function f can be expressed by a generalized Gamma distribution,  [372] carried out an extensive analysis to determine the distribution of the inter-event time based on the Epidemic-Type Aftershock Sequences (ETAS) model (see more details in the following section), and they found that it is only approximately universal and of a gamma form, assuming that p in Eq. (134) has a value close to 1, and the branching ratio of seismicity, the average number of events each event triggers, is around 0.7âĂŞ1. Then Molchan [373] published a rigorous mathematical study on the earthquake inter-event time, and proved that if the scaling distribution would be universal, the functional form must follow an exponential function. Touati et al. analyzed the variation of the distribution for both real data (SCEC catalog) and ETAS simulation data, and suggested that it is not universal [374]. They have shown that the distribution is a bimodal mixture distribution dominated by correlated aftershocks at short waiting times and independent events at longer times.

Modeling Seismic Time Series by the Point Process Approach
There are various statistical earthquake models for describing the occurrences process of earthquakes that can be used for real-time earthquake forecasts [375]. The principle of these models is to evaluate the probabilities of earthquake occurrence by using a point process approach. Among the different models, the ETAS model, which describes the features of earthquake clustering of mainshocks, foreshocks and aftershocks, has become a standard model for testing hypotheses and a starting point for short-term earthquake forecasts [376][377][378][379] Here we will focus on the ETAS model, which is used to generate synthetic earthquake catalogs. The ETAS model was developed by Ogata [369,380,381] who observed that seismic activity can be well described by the Gutenberg-Richter law Eq. (133), and the Omori law Eq. (134). The conditional rate in the ETAS model is given by, where t i are the times of the past events and m i are their magnitudes; H t = {(t i , m i ); t i < t} is the history of occurrence. λ(t|H t ) provides the probability to have an earthquake above a threshold magnitude M z at time t, given the earthquake history. Here, A = K/c p is the occurrence rate of earthquakes in the Omori law at zero lag [374], and α M is called the productivity parameter defined in Eq. (135). The ETAS model can also be written in terms of a stochastic integral [382], where N (ds, dm) = 1 if an infinitesimal element (ds, dm) would include an event (t i , m i ) for some i, otherwise N (ds, dm) = 0. The five parameters (µ, A, c, α M , p) represent some characteristics of seismic activity of the region. Therefore, they vary spatially, and also temporally in some cases. Then the ETAS model was extended to a space-time phase with the conditional intensity function [383], where µ(x, y) still is the background intensity, which is a function of space, but not of time; κ(m) is the expected number of events triggered from an event of a magnitude m in the form g(t) is the probability density function of the occurrence times of the triggered events, f (x, y|m) is the spatial distribution of the triggered events, which can be expressed in the following two ways: a short-range Gaussian decay, or a long-range inverse power decay, The ETAS model has been successfully used for operational earthquake forecasting, e.g., the complex Amatrice-Norcia seismic sequence [384], next-day earthquake forecasts for the Japan region [385] and California region [386]. More general discussions on modeling seismic time series by the point process approach are given in Ref. [387].

Memory Analysis
In this subsection, we will review the memory analysis that has been applied to seismic records. The first method is the so-called detrended fluctuation analysis (DFA), which is used to detect long-range correlations or long-term memory of diverse time series, such as DNA sequences and climate records [180]. Details of the DFA are presented in Sec. 2.3.3. According to Eq. (70), if the time series has a long-term memory, the fluctuation function F (n) increases according to a power-law relation with a exponent α. The DFA has been applied to earthquake interoccurrence times for the first time by Lennartz et al. [388]. They tested the real and the synthetic records for long-term correlations by employing the first two orders of the detrended fluctuation analysis DFA0 and DFA1 on the SCEC catalog (1995âĂŞ1998) and the NCSN catalog (1995âĂŞ1998). They found that there exists a long-term memory between seismic events in the Californian catalogs, which show up in characteristic fluctuations in both magnitudes and interocurrence times. Both DFA0 and DFA1 have resulted in a DFA exponent α ≈ 0.75, independently of the threshold M z and the catalog, see Fig. 45.
A similar analysis, DFA2, has been applied to the real Israeli and Italian catalogs by Fan et al. [212]. They found that the exponent α was also very close to 0.75 for the inter-occurrence time series between earthquake events with different magnitude threshold M z . Figs. 46a-d show the DFA2 results. They also performed DFA on other seismic variables, such as the number of earthquake events and the released energy within a coarse time window dt. The energy of earthquakes was defined as S(t) = where E(t) denotes the number of events that occurred between t and t + dt, and M l (t) denotes the magnitude of the event. To deal with more homogeneous time series, Fan et al. switched to s(t) = log(S(t)) for S(t) > 1 and zero for S(t) ≤ 1 and e(t) = log(E(t)) for E(t) > 1 and zero for E(t) ≤ 1 [212]. Next, applying the DFA analysis for the time series s(t) and e(t) in the Israeli and Italian catalogs, with different time windows dt, were tested. It was found that for both regions and for all studied magnitudes, the value of the scaling exponent α is quite robust ∼ 0.75, i.e., the size of dt does not affect the memory exponent α. Thus, the return intervals, the number of events and the released energy are significantly long-term correlated and have a very similar scaling exponent. The memory analysis has also been applied on the ETAS model. The DFA2 of the inter-occurrence times, with the parameters A = 6.26, µ = 0.2, p = 1.1, α M = 1.5 and c = 0.007, see Eq. (139), yields α ≈ 0.75 and is shown in Fig. 46e.
Another common method for studying memory in the occurrence of earthquakes is called conditional probability (CP), which was introduced by Livina et al. [389]. Considering a time series of recurrence intervals, they sorted it in ascending order and divided it into four 25% quantiles; i.e., the first quantile, Q 1 , represents the shortest 25% of waiting times, etc. The distribution of recurrence times τ , that follow a prior recurrence time τ 0 , P (τ |τ 0 ), was studied, where τ 0 belongs to either one of the quantiles at the extremes, Q 1 or Q 4 . Note that for records without memory, P (τ |τ 0 ) should be independent of τ 0 and should be identical to P (τ ). For the real data, Livina et al. found that P (τ |τ 0 ) depends strongly on the previous recurrence time τ 0 , such that short recurrence times lead to short ones, and long recurrence times follow long ones (see Figs. 47a, c for the Israeli and Italian catalogs). To quantify and measure the level of memory, Fan et al. analyzed the cumulative distribution function (CDF) of the recurrence times and denoted the CDF for Q, Q 1 and Q 4 as CQ(τ ), CQ 1 (τ ) and CQ 4 (τ ), respectively. The strength of the memory for Q 1 is defined as and similarly, the level of memory for Q 4 is Thus, 0 ≤ ρ 1 ≤ 1 and −1 ≤ ρ 4 ≤ 0, and higher |ρ 1 | (or |ρ 4 |), imply stronger memory and ρ 1 = 0 (or ρ 4 = 0), implies no memory [212]. The results are shown in Fig. 47b and d for the Israeli and Italian earthquake catalogs. They found ρ 1 = 0.248, ρ 4 = −0.193 for Q 1 and Q 4 of the Israeli catalog, whereas ρ 1 = 0.260, ρ 4 = −0.153 for Q 1 and Q 4 of the Italian catalog [212]. Further study suggests that the values of ρ are robust and do not depend on M z . To study the dependence of the correlations (α, ρ 1 , and ρ 4 ) on the geographical location (tectonic setting), they performed the DFA and CP analysis for other earthquake catalogs, including New Zealand, Southern California Earthquake Center, Japan unified hIgh-resolution relocated catalog for earthquakes and Preliminary Determination of Epicenters global earthquake catalogs. It has been found that for all catalogs and for all studied magnitudes the values of α, ρ 1 , and ρ 4 are quite robust. Finally, the memory analyses DFA2 and CP, were performed in ETAS models, by measuring the coefficients α, ρ 1 and ρ 4 , as shown in Figs. 46 and 47. The results indicate that the empirically observed earthquake memory (for Israeli catalog) can be reproduced only for a narrow range of the model's parameters. Moreover, the origin of memory in the ETAS model is influenced by: (i) the model's background (noise) rate parameter µ which, affects the memory through the interference of temporally overlapping aftershock subsequences, i.e., smaller µ leads to stronger memory, and (ii) the branching ratio n = Ac p−1 β β−α M : the exponent relating the production of aftershocks as a function of magnitude, α M , and the power p of Omori's law can also affect the memory through the branching ratio of the ETAS model, i.e., smaller p and larger α M result in a stronger memory (see Figures. 11 and 12 in Ref. [212]).
More recently, a lagged CP method was introduced by Zhang et al. to study a possible long-term memory in both the interevent times and distances of earthquakes [390]. The CP of times and distances were defined as ρ (τ k |τ 0 ) and ρ (r k |r 0 ), respectively, where τ 0 (r 0 ) belongs to the Q 1 or Q 3 group. τ k (r k ) is the lagged k-th inter-event time that follows τ 0 (r 0 ). In order to quantify the level of memory, S (τ k |τ 0 ) ≡ 1 − s 13 was introduced, where s 13 is the common area of the PDF between Q 1 and Q 3 . Here S (τ k |τ 0 ) ∈ [0, 1]. Similarly, S (r k |r 0 ) represents the level of memory for the inter-event distances. Interestingly, it has been shown that S(k) can be expressed by the following scaling form, where d u and a are constants, b = 1 as for the Gutenberg-Richter law. The distribution of S (τ k |τ 0 ), S (r k |r 0 ) and their corresponding scaling functions for the Italian catalog are presented in Fig. 48, where d u = 0.14 and a = 0.09 for the inter-event times τ ; d u = −0.08 and a = 0.24 for the inter-event distances r. The   scaling functions shown in Fig. 48c and d suggest a crossover between two distinct power-law relations with F (x) ∼ x −γ . Both scaling functions exhibit a significant crossover at k · 10 bMz 10 5 . These results indicate that the memory measure of different grid sizes and different magnitude thresholds can be rescaled into a single function. However, it was found that the memory function in the ETAS model is very different from that of real earthquake catalogs, i.e., the modelâĂŹs memory is weaker (stronger) on short (long) timescale compared to the real catalogs. Moreover, the model does not exhibit a clear crossover observed in the real catalogs.

Earthquake Forecasting
Earthquakes are one of the most destructive natural disasters in the world. Skilled and reliable earthquake forecasting remains an ultimate goal. The term earthquake prediction is usually referring to the specification of the occurrence time, location or magnitude of a future earthquake. However, "the term earthquake forecasting usually refers to the evaluation of the occurrence probability P Σ of an earthquake inside a hypercell of volume Σ centered in the point ω = (t, x, y, z, m). Therefore, the fundamental quantity is the occurrence probability P Σ ( ω| ) conditioned on the whole set of information available at time t" [36]. The probability of having an earthquake at time t, in the k-th interval conditioned to the previous history is where λ is the local conditional rate as defined in Eqs. (139) or (141), ω k denotes the center of the k-th cell. The forecasting probability, P Σ the integral over Σ of the occurrence rate λ ( ω k |H t k ), depends on the volume of the hypercell Σ, i.e., the smaller Σ, the more accurate is the evaluation of λ. Secondly, the temporal extension T of Σ also influences the forecasting accuracy. According to the time scale of T , one can separate arbitrarily Long-Term from Short-Term forecasting [391]: earthquake Long-Term forecasting, for T in the interval of decades to centuries; Intermediate-Term forecasting, for T of the order of months and Short-Term forecasting, for T in the interval from few seconds up to several weeks. Short-Term forecasting is usually subdivided into post-seismic and pre-seismic forecasting, referring to the aftershock and foreshock of a mainshock. It is notable that the post seismic forecasting is still very important from the point of view of risk management, since aftershocks can have sizes comparable or even larger than their triggering mainshock. Long-Term forecasting, however, is probably the most relevant from the engineering point of view, such as urban planning risk assessment. Based on the early warning foreshock observations, one could successfully predict some big earthquakes, such as the 1975 Haicheng, China earthquake [392] and the 1995 KozaniâĂŞGrevena, Greece earthquake [393]. Nevertheless, most earthquakes do not present significant precursory patterns, e.g., the 2004 Parkfield (California) earthquake [394]. Thus the evaluation of foreshocks or precursory patterns is still an open question. Since most earthquake models are based on the well-established empirical laws, aftershock spatio-temporal occurrence clustering, as the ETAS model, they are able, in some degree, to provide reliable risk assessment only for the Long-Term and the post-seismic Short-Term forecasting but not for the foreshocks. Nevertheless, how to validate a forecasting model is still of great importance. In the following, we will review various methods for evaluating earthquake predictions and earthquake forecasts.

Receiver operating characteristic diagram
The receiver operating characteristic (ROC) diagram is a graphical plot that illustrates the prediction quality of a binary classifier system as its discrimination threshold is varied. For evaluating earthquake predictions, if an earthquake is expected ("YES") with a probability P Σ > P t , where P t is a threshold, we call it a hit, if an event has occurred; if earthquakes are not expected ("NO") in the cell, we call it a miss, if an event has occurred. Each "YES" prediction in which no corresponding earthquake is observed is a false alarm, and each "NO" prediction in which no earthquake occurred is a correct rejections. In this representation, there are four possible combinations of alarm declaration and event observation, see Table. 1 for the number of each of these contingencies. Here we define a as the number of hits, b as the number of false alarms, c as the number of misses and d as the number of correct rejections, respectively. We define a true positive rate (T P R) as T P R = a a + c , and false positive rate (F P R) as When T P R and F P R are plotted together on the square [0,1] × [0,1], the resulting metric is called the ROC, whereas the perfect prediction corresponds to the point with coordinates T P R = 1 and F P R = 0, the diagonal line represents the long-term behavior of random guessing. One can obtain the full ROC curve by changing P t continuously. Here we show an aftershock forecasting ROC curve by using a deep learning method in Fig. 49 as an example.

The Molchan diagram
Closely related to the ROC is the Molchan diagram, which is an error diagram that allows to compare the prediction of the test model and another generic reference model [396]. It is a plot of the miss rate, M = 1−T P R, and the fraction of experiment space-time volume, F , occupied by alarms or YES predictions. The best prediction in the Molchan diagram is the point (F = 0, M = 0). In particular, the probability of obtaining h or more hits by chance, given that there have been n observed target earthquakes, is described by the binomial distribution where n i stands the binomial coefficient. One can obtain a confidence level curve in the Molchan a b  The N-test is used to access how well the total number of forecasted earthquakes N f orecast matches the number of events observed N obs [397]. The target earthquakes are assumed to follow a generalized Poisson distribution, and the mass function of the negative binomial distribution (NBD) can be expressed as [384] f (n; r, p) = Γ(r + n) n!Γ(r) p n (1 − p) r , where Γ(r) is the gamma function, n is the number of target earthquakes, and r, p are the two parameters of the distribution. Then NBD can be derived as f (n; r, p) = ∞ 0 f 1 (n; λ)f 2 λ; r, where f 1 (n; λ) is the Poisson distribution and λ is assumed to follow a gamma distribution where k, θ are the parameters of the gamma distribution with r = k and p = θ 1+θ . Next, we calculate the parameters k and θ of the gamma distribution. To interpret the N-test results, one can use a one-sided test with an effective significance value α ef f , which is half of the intended significance value α = 0.05. Fig. 50c shows an example of the N-test.
The Space test (S-test) The S-test is a likelihood test where only the spatial distributions of the forecast and the observation are considered [398]. The S-test can be summarized by a where S i is the i-th simulated spatial likelihood, S s is the set of simulated spatial likelihoods, S is the likelihood of the spatial forecast relative to the observed, n{A} is the number of elements in a set {A}. If ζ is less than critical significance value α, the observed spatial distribution is inconsistent with the forecast. However, values close to 1 indicate optimal spatial forecasts. We show an example of the S-test in Fig. 50a. Besides the aforementioned methods, there are still various other methods that have been developed to evaluate the accuracy of a forecasting model, such as, the Likelihood test (L-test), the Likelihood Ratio test (R-test) and the Magnitude test (M-test). An instructive review on the the evaluation of earthquake predictions and earthquake forecasts can be found in Ref. [399].
As discussed in the last section, we studied the possible origin of memory in earthquakes for both real catalogs and ETAS models. Here, we will show that the memory analysis can significantly improve the short-term forecasting rate for the real earthquake catalog. In estimating the ETAS model parameters for a given earthquake catalog, the ETAS parameters are commonly inverted from the data based on the pointprocess maximum likelihood (ML) method, by the Davidon-Fletcher-Powell algorithm [380] or by simulated annealing [400]. For example, by using the ML method, one can obtain µ = 0.2, A = 6.26, α M = 1.4,  139), for the Italian catalog [400]. When considering the memory, however, the ETAS model can reproduce the same memory as in real catalogs, only for a small range of parameter values, i.e., µ = 0.2, A = 6.26, α M = 1.5, p = 1.1 and c = 0.007 [212]. As a test case the cumulative number of events, NC, after the Capitignano 5.7 main shock that happened on 18 January 2017 [401], was forecasted by using the ETAS model. The cumulative number at time t is calculated by integrating Eq. (139) from 0 to t. This way the forecasting accuracy within 14 days after the main shock can be improved by more than 20% by considering the memory. Moreover, a revised and generalized ETAS model was proposed recently by Zhang et al., in which the short-and long-term/distance memory was reproduced accurately [402]. The new ETAS model is also found to significantly improve earthquake forecasting for the Italian and South California earthquake catalogs.

Conclusions and Perspectives
In this article, we have reviewed statistical physics and complex networks-based techniques that advance our knowledge on the complex Earth system, a relatively novel branch of geophysics. These techniques can help to address the understanding as well as the prediction of climate variability, Earth geometric relief features and earthquakes. In this section we try to draw a few concluding remarks and address the perspectives that remain still open for future research. We believe that these novel approaches will attract the attention of scientists in the related fields.
In Section 2 we started by describing the overall methodology based on statistical physics for analyzing important aspects of complex Earth subsystems. It includes complex networks, percolation theory, tipping points analysis, entropy theory and complexity. Although it was shown that the aforementioned approaches can be successfully applied to Earth systems individually or in a combination, there is still a lot of work ahead in order to develop a proper and comprehensive framework. Among the different open problems to be solved, we highlight the following four: i) The need of gathering a better knowledge and understanding on the underlying physical mechanisms; ii) The need of extending concepts of CNs to a more comprehensive description, such as multilayer climate networks [403]. This is since the climate is a coupled and multilayered system; iii) The need of developing a universal theory of critical phenomena (percolation) in climate systems, in particular for out-of-equilibrium systems; iv) The need of proposing a principle on how to anticipate the tipping points and entropy of the Earth system by means of time series of multiple spatio-temporal variables.
In Section 3, we have continued by reviewing the applications of statistical physics approaches to different Earth subsystems. Specifically, five weather and climate dynamical scenarios were highlighted in Section 3.1, including, El Niño-Southern Oscillation, Indian summer monsoon, extreme rainfall, atmospheric circulation and Atlantic meridional overturning circulation. We suggested that the evolving CNs (and other combined approaches) interactions and patterns can serve as novel predictors which significantly improve the predictions. Beside the above mentioned scenarios, the CNs method was also widely applied to other high-impact phenomena, such as the westward propagation of the Atlantic multidecadal oscillation [99], the detection of teleconnection paths [62], the identifying of causal gateways and mediators [404], the early prediction of extreme stratospheric polar vortex states [405], the impacts of carbon dioxide (CO2) on global SAT [406], etc. In spite of the great achievements, there are still many unexplored problems related to climate systems that merit further attention. Therefore, further topics are needed to be investigated, for example, i) how human activities have changed and continue to change the EarthâĂŹs oceanic and atmospheric composition, some of these changes have a direct or indirect impact on the energy balance of the Earth and are thus drivers of climate change; ii) The statistical mechanics and CNs behaviours in Earth System Models, ranging from simple energy balance models to general circulation models; iii) The projected short-term and long-term abrupt phase transitions in paleoclimate, in particular, the role of sea ice and Snowball-Earth initiations [407][408][409]; iv) Improving the predictability of severe convective weather processes including thunderstorms, hail and tornadoes by using the CNs approach.
In section 3.2, we have reviewed the current literature on the statistical features and the percolation framework of the EarthâĂŹs surface topography. We have pointed out the evidence for abrupt transitions that occurred during the evolution of the EarthâĂŹs relief network, indicative of a continental/cluster aggregation. A further analysis via fractional Brownian motion models suggests that long-range correlations may play a key role in the observed discontinuity on Earth. This research may facilitate the understanding of the geometrical phase transition on Earth, but also can be used to identify the critical nodes for future global change in EarthâĂŹs relief network. In particular, i) the future movement of tectonic plates, as well as ii) the dynamic evolution of these critical nodes are of key importance and thus need to be further addressed.
In section 3.3, we first presented a description of the well established empirical basic laws of earthquakes, including the Gutenberg-Richter, Omori and productivity laws. We then discussed the scaling hypothesis of the inter-event distributions. An epidemic-type aftershock sequence model (ETAS) and various methods for evaluating earthquake predictions and earthquake forecasts have been also reviewed. At last, we have shown that the two proposed memory analysis approaches, DFA and CP, can significantly improve the short-term forecasting rate for real earthquake catalogs. Nevertheless, earthquake prediction research has been plagued by controversy, and it remains an outstanding challenge. Besides the improvement of shortterm aftershock forecasting, we also emphasize the potential of new directions. i) The wide applications of tipping points and memory analysis in theoretical earthquake models which account for accelerated moment release and foreshock occurrence together with other precursory patterns, where mainshocks were treated as characteristic earthquakes within a seismic cycle [410]. ii) The development of an earthquake forecasting model which is implemented for describing foreshock organization and memory behaviors. iii) Motivated by the successful applications of network theory to the prediction of climate phenomena, the proposing of an earthquake network framework based on the occurrence time series [411], or in particular the seismic waves (elastic body and shear waves) and elastogravity signals preceding direct seismic waves [412].
One of the key features of statistical physics-based approaches reviewed here is the ability to better and reliably forecast the complex Earth phenomena, such as El Niño events, extreme rainfall in the Eastern Central Andes, Indian summer monsoon, the collapse of the Atlantic multidecadal oscillation and the occurrence of earthquakes. Moreover, we observed that artificial intelligence and deep learning techniques [413] have achieved great success during recent years in many fields, such as phase transitions in statistical physics [414][415][416], data-driven Earth system science [417], ENSO forecasts [294,295], Indian monsoon rainfall [418], as well as the forecasting aftershock patterns [395] and detecting earthquakes [419] in seismic systems. In fact, we are confident that the combination and complement of network-based and artificial intelligence-based skills will boost each other. For instance, an application of machine learning to network attribute vectors (products similarity) can predict successful and failing firms with much better accuracy than current state of the art techniques for market forecasting [420]. In addition, deep learning systems can be regarded as complex networks, which thus gain some insights into the structural and functional properties of the computational graph resulting from the learning process [421].
As a final remark, there is still room for many approaches considering the analysis of climate resilience of societies, ecosystems and economies by using the complex network theory. We currently lack appropriate models to better understand or predict the effects of cascading failures [30] triggered by the increasing adverse effect of extreme climate/weather events on interdependent critical infrastructures. Closing this knowledge gap is crucial for developing means to achieve both climate and infrastructure resilience. Another very relevant subject of investigation that will certainly attract huge attention is the assessing and quantifying of the complex climateâĂŞhealth relationships. There is an overwhelming consensus that climate conditions, including temperatures and spatial-temporal distribution of precipitation, has key implications for human health [422]. Moreover, the morbidity and outbreak of some disease, such as the vector-borne (malaria and dengue fever) disease, the ongoing COVID-19 pandemic [423], and influenza are strongly affected by the climate change or the environment [424][425][426][427][428]. However, the climatic influences are often excluded from consideration in the traditional epidemiologic models [113], e.g., the SIR, SIRS and SEIR models. What we need to emphasize is that traditional epidemiologic models with taking into account the climate factor could be a very promising road towards a deeper understanding of epidemic processes and assess the health consequences of climate change both regionally and globally.