Analysis of spatially coherent forecast error structures

Understanding error properties is an essential part in numerical weather prediction. Predictable relationship between errors of different regions due to some underlying systematic or random process can give rise to correlated errors. Estimation of error correlation is crucial for improvement of forecasts. However, the size of the corresponding correlation matrix is larger than what is possible to represent on geographical maps in order to diagnose its full spatial variation. Here, we propose a complex network‐based approach to analyse forecast error correlations that enables us to estimate the spatially varying component of the error. A quantitative study of the spatio‐temporal coherent structures of medium‐range forecast errors of different climate variables using network measures can reveal common sources of errors. Such information is crucial, especially in cases such as the outgoing long‐wave radiation, in which errors are correlated across very long distances, indicating an underlying climate mechanism as the source of the error. We show that the spatial patterns of forecast error co‐variability may not be the same as that of the corresponding climate variable itself, thereby implying that the mechanisms behind the correlated errors may be different from the climate processes responsible for the spatio‐temporal interactions of the climate variable. Our results highlight the importance of diagnosing the full spatial variation of error correlations to understand the origin and propagation of forecast errors, and demonstrate complex networks to be a promising diagnostic tool in this regard.

state of the system at one location can also affect that at another location.Common error between measured values due to a common measurement gives rise to error correlations (Herzberg, 1982), which are indicative of systematic or structured random errors (Boer, 1993).Estimation of error correlations is very important for producing quality forecasts and is a key issue for data assimilation (Derber and Bouttier, 1999).In particular, proper estimation of the spatially varying component of the error is important to include inhomogeneities and anisotropies for further improvement of the forecasts (Parrish and Derber, 1992;Wu et al., 2002;Kleist et al., 2009).However, it has been a significant challenge to diagnose the full geographical variations of error correlations because the total number of elements in the correlation matrix is the square of the number of grid points, making it difficult to represent them on geographical maps (Pereira and Berre, 2006).An economical approach commonly used to obtain a synthetic view of the spatial variability of error correlations is to estimate the local correlation length scale and then study its latitudinal variations (Daley and Barker, 2001;Ingleby, 2001;Pereira and Berre, 2006).
In this work, we propose an alternative approach to analyse and view the spatial variation of error correlations based on the theory of complex networks.In recent decades, complex networks have been used extensively for inferring statistical interrelationships from spatio-temporal data of spatially extended systems, such as the human brain (Bullmore and Sporns, 2009) or the Earth's climate systems (Boers et al., 2021;Ludescher et al., 2021), the latter referred to as "climate networks".Such an approach is useful for cases where the complete knowledge of the interactions between the different components of a complex system is lacking.Under this framework, the spatial observation points derived from the underlying spatio-temporal dataset are called "nodes", and the knowledge of their functional interdependence, or "links", is inferred by assessing the (pairwise) similarities between their measured or simulated dynamics.Then, measures derived from graph theory are used to characterize the topological features of the complex network so obtained (Donges et al., 2009).The climate network approach has been used to study patterns of climate variability in different climate variables, such as temperature, pressure, geopotential height, wind, and precipitation, at various scales (Tsonis and Roebber, 2004;Yamasaki et al., 2008;Donges et al., 2009a;Ludescher et al., 2013;Radebach et al., 2013;Runge et al., 2015;Gelbrecht et al., 2017;Boers et al., 2019;Gupta et al., 2021;Lu et al., 2022;Gupta et al., 2023).The methodology has also been used in previous studies for the purpose of model evaluation in order to identify the underestimation or overestimation of statistical links, and hence teleconnection patterns, by comparing the depiction of climate interactions in the reanalysis data with that in the forecasts (Steinhaeuser and Tsonis, 2014;Boers et al., 2015;Feldhoff et al., 2015;Di Capua et al., 2022;Gregory et al., 2022;Dalelane et al., 2023).
The goal of this article is to put forward the concept of "error networks" as an effective tool to diagnose the full geographical variation of the error correlation matrix.Although similar to the idea of climate networks, the approach is innovative, as in this case we are not interested in unravelling the coupling or interaction between two components of the Earth's climate, but rather gain insights into the origin of forecast errors in a climate variable by identifying spatially coherent patterns of regions having common sources of error.Therefore, here, we retrieve the network representation of the spatio-temporal forecast error dataset of a climate variable, instead of the variable itself, by computing the pairwise statistical interdependencies (here, correlation) between the forecast error time series of the different locations.The spatial coherence patterns of the resultant error correlation matrix can then be studied using the various network measures that characterize the topological structure of the network.Furthermore, it should be noted that this approach of analysing the underlying spatial relationships in the forecast error data is, however, different from the aforementioned previous model evaluation applications of climate networks.Here, the geographical variation of the network measures computed from the error correlation matrix can reveal the spatial heterogeneity of the errors, the pattern of which can provide an inkling of the dominant source of error.In this article, we demonstrate this by analysing the error networks of different climate variables.Such an analysis, specifically designed to study the spatially coherent structures of forecast errors, has not been performed earlier to the best of our knowledge.
In this work, we focus mainly on analysing the error properties of the Asia-Pacific region in the Northern Hemisphere during the summer months of June, July and August (JJA) during which the southern and the eastern parts of Asia experience the monsoon.Using the network approach, we gain a preliminary understanding of the origin of the forecast error in climate variables such as wind, geopotential height, and outgoing long-wave radiation.Our results highlight the presence of crucial regions that exhibit high influence on the error properties of a particular climate variable at not only nearby regions but also over very long distances.Furthermore, we show that, for different climate variables, the topology of the error correlation network exhibits varying levels of differences with the respective reanalysis network of the corresponding climate variable.This points towards the fact that the spatial patterns of forecast error co-variability may not be the same as the co-variability pattern of the corresponding climate variable itself, thereby suggesting that the mechanisms behind the correlated errors can be different from the climate processes responsible for the spatio-temporal interactions of the climate variable.
The remainder of the paper has been organized as follows.In Section 2, we list the datasets employed and then outline the methodology.In Section 3, we first show that error correlation networks of different climate variables exhibit significant spatial patterns, and then we discuss our findings on the properties of errors mostly affecting the Asian summer monsoon from the analysis of these networks.Thereafter, we make a comparison of the error network with the information inferred from the reanalysis and forecast networks of the same variables, thus demonstrating that the analysis of error correlations using network measures reveals important information about the underlying sources of errors.Finally, in Section 4, we provide concluding remarks regarding the relevance and the future scope of the work.

Data and preprocessing
We use the fifth-generation European Centre for Medium-Range Weather Forecasts Atmospheric Reanalysis (ERA5) data (Hersbach et al., 2020) at a spatial resolution of 1 • × 1 • and daily mean of the hourly values for the period 1980-2020, for outgoing long-wave radiation (OLR) as well as lower tropospheric (at 850 hPa) climate variables, namely, the geopotential (Z850), and the meridional (V850) and zonal (U850) components of wind.Forecast data for the same variables is obtained from the 10-day forecasts produced from the same system by averaging over a time interval of 120-144 hr (day 5-6).The forecast lead-time of day 5-6 is chosen in order to focus on large-scale errors in the medium range and also to get error propagation that is beyond linear advection of error structures (Magnusson, 2017).Daily anomalies of the variables are computed with respect to the daily climatology of the whole period of analysis.The forecast errors are computed by subtracting the reanalysis from the forecast data (Figure 1).We remove the effect of a mean bias in the forecast model by subtracting the mean error from the forecast error time series and take the absolute of the resultant values in the error time series for our subsequent analysis (see Figure 2).We analyse the error properties during the JJA season using our climate network approach.Our analysis is restricted to the Asian summer monsoon region and the adjacent Indian and Pacific oceans, which play an important role during the monsoon.The region of interest extending from 35 • N to 10 • S and from 3 • E to 120 • W includes the Niño 3.4 region (5 • N-5 • S, 170 • W-120 • W) as the El Niño-Southern Oscillation (ENSO) has a considerable impact on the interannual variability of the Asian monsoon (Ju and Slingo, 1995).We use the Oceanic Niño Index, which is based on variations in 3-month running means of sea-surface temperatures in the Niño 3.4 region (NOAA, 2022), to identify the ENSO phases.

Methods
We identify the spatial coherence pattern of the reanalysis, forecast, and forecast error data by using a complexnetwork-based approach.The network representation of a complex system encodes the pairwise interactions between its components.For given spatio-temporal data, this is constructed using the framework of climate networks, which are functional networks.Hereafter, we describe the method of network construction and the measures used to analyse and compare the network topologies.

Network construction
In a climate network, each geographical grid point of the spatio-temporal climate dataset represents a node (Tsonis et al., 2006).Additionally, each node represents a dynamical system which is associated with the corresponding time series of the climate variable.The connection between a pair of nodes is computed on the basis of the statistical inter-dependency between them.
In our case, we calculate the links of our network by computing the Spearman's rank correlation coefficient between different pairs of nodes at zero lag.Then, we perform a significance test on our data and construct the correlation matrix by retaining only those correlations whose p−values are less than 0.05.We assume a pair of nodes, i and , to be connected if the absolute value of the correlation coefficient, C i , between them lies beyond a certain threshold , that is, we consider only the strongest correlations and ignore the sign.Here, we choose the 95 th percentile of the absolute values of the correlation matrix as the threshold , which ensures considerable interconnectivity while retaining only the strongest correlations, thereby controlling the resultant link density ( ≈ 0.05), and is in agreement with previous studies on climate using the functional network approach (Radebach et al., 2013;Stolbova et al., 2014; Ozturk et al., 2019;Gupta et al., 2021).Thereafter, the network can be represented by the adjacency matrix, which is a binary matrix whose elements, A i , are set to 1 if there exists a connection from node i to , and 0 otherwise.Mathematically, it can be represented using the Heaviside function, Θ(x) (where Θ(x) = 1 when x > 0 and Θ(x) = 0 otherwise), as , where  i is the Kronecker delta (subtracted in order to remove self-loops).We note that, here, A i = A i , that is, the adjacency matrix is symmetric.This results in an undirected and unweighted network.One may alternatively retain the significant absolute correlation values as the link strengths in order to avoid thresholding, and treat the network as a weighted one.However, unweighted networks with fixed link densities allow not only a comparison of network patterns for different variables but also more freedom while using different network-based measures.For a more detailed description on the application of complex networks to spatio-temporal data, refer to Supplementary Text S1.

Network measures
After constructing the network, we compute the network measures degree and common component function.

Degree
The degree of a node is defined as the number of connections it has (Newman, 2010).In the case of an undirected network, the degree of a node i can be defined as Alternatively, for weighted networks, A i in Equation ( 1) should be replaced by the link strength between nodes i and  to compute the weighted degree.In most cases, the high-degree nodes, those with the more connections, play an important role in the functioning of the system.Hence, degree can be a useful guide for focusing our attention on the system's most crucial regions, which in this case have either strong influence on the forecast errors of other regions or are strongly influenced by them.In order to find the areas interacting with a region R of high degree in the climate network, we calculate the "partial degree" {k i } R of the nodes in the network, which yields the number of links connecting a node i outside R with the nodes within R. Mathematically, we have If the errors are uncorrelated, as in case of independent random errors, the forecast error network would be completely random.However, if the errors of different grid points have a (partially) predictable relationship between them due to an underlying deterministic or structured random process, such as that for systematic or structured random errors, then the errors have correlated structures that will emerge as definite spatial patterns in the forecast error network.Since we preserve only the highest correlations to construct our network, we expect the most dominant structured errors causing the strong spatio-temporal correlation in the forecast error of the climate variable to show up in the spatial distribution of high-degree nodes in the network.This is a typical property of most real-world networks, that there exists a small number of nodes with unusually high degree, known as "hubs", that in some cases are known to have a dominant effect on the behaviour of the network as a whole (Newman, 2010).As we are interested in the relationship between errors of different geographical points, the significance of the degree distribution of the forecast error network can be tested against that of random networks in which it is not possible for hubs to emerge as the degree k i is comparable for every node.Therefore, we can simply compare the degree distribution of the original unweighted network with the mean degree distribution of Erdös-Rényi networks (Newman et al., 2001) with the same number of nodes and average degree, obtained by rewiring the links of the original network entirely randomly.However, as climate networks are spatial networks, the link probability depends on the geographical lengths of the links, due to which the influence of spatial embedding on the network structure should also be taken into account (Barnett et al., 2007) (refer to Supporting Information Text S1).Hence, it is more appropriate to test the significance of the degree distribution of the original unweighted climate network against those of spatially embedded random networks (Rheinwalt et al., 2012), which preserve the number of nodes and the link-distance distribution of the original network.
Another basic property is the "degree assortativity coefficient" r, which is a measure of preferential connectivity in networks based on the node degree; that is, whether a node of high degree preferentially is linked to other nodes of high degree, and vice versa.It is calculated as the Pearson correlation coefficient of degree between pairs of linked nodes, with its value in the range −1 ≤ r ≤ 1 (Newman, 2003).Positive values of r indicate a correlation between nodes of similar degree, with r = 1 implying that the network has a perfect assortative mixing pattern.Negative values indicate relationships between nodes of different degree, where r = −1 implies that the network is completely disassortative.When r = 0 the network is non-assortative.The degree assortativity could be used as a way to estimate the level of homogeneity/heterogeneity of real networks, where a more assortative network is more homogeneous, and vice versa.

Common component function
For two unweighted, undirected networks G a (N, E a ) and G b (N, E b ), defined on the same set of nodes N but with links E a and E b respectively, the common component function CCF(G a , G b ) counts the number of common links n(E a ∩ E b ) between the pair of networks (Tupikina et al., 2014): where A a and A b are the adjacency matrices of networks G a and G b respectively, and is simply the number of links in the network G a .Therefore, the normalized CCF(G a , G b ) takes values in the range [0, 1].This implies that if the networks G a and G b are completely identical then CCF(G a , G b ) = 1, and CCF(G a , G b ) = 0 if they have no common links.This network measure enables us to measure the similarity in network topology between two different networks.It will be useful here to quantify the degree of similarity between the reanalysis, forecast, and forecast error networks.

Spatial patterns of error correlation networks
The forecast error networks of different climate variables are shown in Figures 3a-d.They exhibit definite significant spatial patterns of degree, unlike random networks (Figure 4).This indicates the presence of an underlying systematic or random process that leads to spatio-temporally interrelated errors.We see that the degree distribution of the forecast error networks (Figure 4) have long right tails, indicating that few nodes have very high degrees.We focus on the high-degree nodes in the forecast error networks of Figure 3, which are important regions affecting the behaviour of the forecast error patterns of the whole region.Since the network measure degree counts the number of links (representing pairwise interactions) of each node, the dominant errors causing a pattern of influence on the forecast of multiple regions show up as a particular pattern of locally or distantly connected high-degree nodes in the network.We note that the spatial pattern of weighted degree corresponding to weighted networks of forecast error, constructed using the absolute values of the significant correlations as link weights for the different climate variables (Supporting Information Figure S1), are similar to those obtained from the unweighted error networks (Figure 3a-d).
In the case of both the wind components at 850 hPa (U850 and V850), the part of the western North Pacific Ocean adjacent to southern China and the Maritime Continent has the highest degree (Figure 3a,b).Furthermore, we find that the connections of these high-degree nodes are limited to the nodes in this region only (Figure 5a,b).The mean absolute forecast errors of U850, V850, and Z850 in Figure 2a-c show that the region a bit more north of the highest degree regions of the wind error networks is associated with large error in the western Pacific subtropical high (WPSH).Though the area of the WPSH in Figure 3a,b shows higher degree than most of the other regions, it appears that the southern boundary of the WPSH exhibits the highest correlated error structures, as seen from the region of the highest degree.WPSH is an important circulation pattern that affects the Asian summer monsoon and the tropical cyclone activities in the highly active western North Pacific Ocean basin.Recent studies (Qian  , 2013;Magnusson et al., 2019;Tang et al., 2021) have shown that subsequent track forecast errors of tropical cyclones are strongly sensitive to small initial errors in the predictions of the WPSH, which can fluctuate on synoptic time-scales, and, therefore, are prone to non-systematic errors (Tang et al., 2021).Gao et al. (2022) also showed that there are systematic biases in the WPSH forecasts, such as a smaller area and an eastward and southward shift of location.The predictability of WPSH is a prerequisite for the improved prediction of not only western North Pacific tropical cyclones but also the Asian summer monsoon rainfall (Wang et al., 2013).The small high-degree region over southwest India is locally systematic or correlated and may be related to the wind errors associated with monsoon onset over India.
However, the region of the WPSH is not seen to exhibit a high degree in the forecast error network of Z850 (Figure 3c).Instead, we see a high degree in the region of the equatorial central Pacific Ocean, along with a lighter patch of comparatively lower degree in the equatorial Indian Ocean.These regions are also associated with large mean forecast errors in Z850 (Figure 1c) and show diverging wind error fields.This suggests that the error might be related to the intertropical convergence zone (ITCZ), as also seen in the mean absolute forecast error of OLR (Figure 2d).The connections of the high-degree nodes in the equatorial central Pacific region are limited to the nodes in the same region (Figure 5c), although the connectivity structure appears to be much smoother than those of the wind error networks (Figure 5a,b).Furthermore, even though all the forecast error networks (Figure 3) are assortative (i.e., high-degree nodes have a tendency to be connected to high-degree nodes), as seen from Supporting Information Table S1, the Z850 forecast error network (Figure 3c) in particular has a very high degree assortativity of r = 0.8, indicating more homogeneity than the other networks.This homogeneous large-scale coherent structure in the Z850 error network is, in particular, similar to the geopotential height perturbation at 850 hPa of the zonal wave number 1 equatorial Kelvin wave (Žagar  Lee and Huang, 2020), which is believed to be the main driver of the Madden-Julian oscillation system (Kikuchi et al., 2018;Raphaldini et al., 2021).This suggests that the coherent error pattern in Z850 is possibly due to significant uncertainties in both weather forecasts and climate models, associated with a lack of direct observations of wind profiles and the representation of the unbalanced tropical circulation consisting of inertio-gravity waves (Žagar et al., 2015; Žagar et al., 2016).
The forecast error network of OLR (Figure 3d) shows multiple regions of very high degree.There are three separate patches of high degree in the Pacific Ocean: two in the Northern Hemisphere and one just below the Equator.The grid points in the Maritime Continent and northwest India and Pakistan also exhibit high degree.Moreover, parts of the western Indian Ocean along the Somalian coast, extending latitudinally from 10 • S up to the Arabian Sea in the north, also have high degree.This high-degree region in the western Indian Ocean coincides with the path of the cross-equatorial southwesterly low-level Somalian jet, which is a primary source of moisture for the Indian summer monsoon.We find the regions connected to all the aforementioned high-degree regions by computing the partial degree of some nodes within the region (Figure 6a,b and Supporting Information Figure S2).It is observed that the three high-degree regions in the Northern Hemisphere are interconnected to each other, in spite of being separated by very long distances (Figure 6a and Supporting Information Figure S2a,b).A similar interconnectivity pattern is observed among the high-degree regions in the western Indian Ocean, the Maritime Continent, and the Pacific Ocean region centred around 160 • W below the Equator, although the number of links is comparatively less (Figure 6b and Supporting Information Figure S2c,d).These high-degree regions are observed to be associated with the longitudinal extents where the vertical velocity changes sign -that is, the transition zones between the ascending/descending limbs of the Walker circulation, as shown in the longitude versus height plots of vertical velocity for the ENSO neutral years (Figure 6c,d), classified on the basis of the Oceanic Niño Index (NOAA, 2022).Furthermore, the high-degree OLR error connectivity pattern is sensitive to the ENSO phases.For example, the pattern during the warm and cold phases of ENSO is comparatively weaker than the ENSO neutral years (refer to Supporting Information Figure S3).This is possibly because the system is more sensitive during weakly forced years (i.e., without an El Niño or La Niña event).ENSO phases are known to affect the Walker circulation, which further strengthens the argument that the coherent OLR error pattern is associated with the Walker circulation.This relation to the Walker circulation points towards cloud biases in the forecasts that lead to systematic errors in the simulation of the ITCZ, as seen from Figure 2d.
The error network analysis of the Asian summer monsoon region and adjacent oceans during the boreal summer thus reveals that the important drivers of the monsoon system, such as the WPSH, tropical circulation, and the Walker circulation, are significant sources of forecast errors in different climate variables in the medium-range forecasts.However, a possible mechanism suggesting that the error sources in all the climate variables investigated may be related is that the misrepresentations of ocean-atmospheric interactions (Li and Xie, 2014) and the thermodynamic processes in the equatorial Pacific might induce errors in the Walker circulation, which in turn affect the predictability of the WPSH (Chung et al., 2011;Wang et al., 2013;Toh et al., 2018) and then show up as different topological patterns in the forecast error networks.
In view of the aforementioned discussion, an important finding is that forecast errors can be highly correlated over very long distances, such as those observed in the OLR error network.The distribution of the geographical distances of the links in the respective error networks, calculated using the Haversine formula for spherical Earth projected on to a plane, fits well to a power law distribution for most networks, indicating the presence of long-range error connectivity (Supporting Information Figure S4).This further emphasizes the significance of performing an analysis of the spatial correlations of forecast error.

Effect of statistical relationships in reanalysis/forecast data on error correlations
Following from the aforementioned discussion, it is important to understand whether the spatio-temporal connectivity pattern of the error is inherited from the connectivity structure of the observed or predicted climate observable.In other words, for a given climate variable, whether the existence of a high statistical dependency between two regions may cause a correlation between their predictability skills due to common errors arising from the same process that connects them.In terms of climate network, this question transforms to whether the interaction structure of reanalysis (and forecast) data resembles that of the forecast error data.
We compute CCF to count the fraction of common links between the reanalysis, forecast (Supporting Information Figure S5), and forecast error networks (Figure 3) of the different climate variables (Table 1).We see that the connectivity structures of forecast error networks have a varying degree of similarity with corresponding reanalysis or forecast networks for different climate variables.For instance, the similarity between the forecast error network and reanalysis network is the highest for Z850 (CCF(R, FE) ≈ 0.8), less similar for OLR, and least similar for the wind components U850 and V850.This indicates that, for a given climate variable, if two regions have a high statistical interdependency between them, their predictability skills may not be correlated.It can be inferred that, for Z850, the underlying climate phenomena responsible for the interaction pattern in the reanalysis data is highly likely to be the cause of the correlations between the forecast errors.This is, however, only partially the case for OLR, and is even less so for the wind components.The underlying climate interactions responsible for the spatial coherence patterns of the reanalysis/forecast data are outside the current scope of this work.
It must be mentioned that there is high, although not exact, resemblance between the reanalysis and forecast networks (CCF(R, F) ≈ 0.9); that is, the state-of-the-art ERA5 system simulates the real climate system well.However, it must be clarified that the 5-day forecast error networks do not simply reproduce the excess/missing links between the reanalysis and forecast network (Supporting Information Figure S5).This can be verified by comparing the pattern of the difference in degree of reanalysis and forecast networks (Supporting Information Figure S6) with Figure 3. Except for Z850 (Supporting Information Figure S6c), the error networks of other variables bear little resemblance with the degree difference pattern between reanalysis and forecast.The degree difference for U850 (Supporting Information Figure S6a) shows missing interactions in the western North Pacific Ocean.Several differences between the reanalysis and the forecast networks of wind (Supporting Information Figure S6a,b) also occur in the northern Indian Ocean, and the monsoon-affected regions of south India, southern China, and the Maritime Continent.Errors related to overestimation/underestimation of links in the ITCZ can also be seen in the degree difference between reanalysis and forecast networks of OLR (Supporting Information Figure S6d).However, as our purpose here is not the evaluation of climate interactions predicted by models (Steinhaeuser and Tsonis, 2014;Boers et al., 2015;Gregory et al., 2022;Dalelane et al., 2023), we do not seek a detailed understanding of the differences between the reanalysis and forecast network connectivity structure.But from our aforementioned discussion, it is clear that the topological structure of the forecast error correlation network of the climate variable indeed highlights the primary source of structured error in that variable, which may not be revealed from the connectivity structure of the variable itself.

CONCLUSION
Understanding error properties is a very important aspect of data assimilation and numerical weather forecasting.In this article, we have used a methodology based on complex networks to study the spatially coherent structures of the forecast errors of different climate variables.By introducing the concept of error networks to represent error correlations, we were able not only to visualize the N × N error correlation matrix on an N-grid-point geographical space but also perform an in-depth analysis of the local and global effects of errors by applying measures derived from graph theory.This innovative approach allowed us to characterize the spatial heterogeneity present in the error properties of different climate variables, which further revealed the key error source regions, thereby demonstrating the importance of studying the spatially varying component of the error.Using one of the basic network centrality measures, the degree, which counts the number of (highly correlated) connections a node has with other nodes of the network, we highlighted the effectiveness of the network-based approach to study error correlations.We illustrated using examples of some typical variables, namely, the zonal (U) and meridional (V) components of wind and geopotential (Z) at 850 hPa, and OLR, that their respective error correlation networks showed definite spatial patterns of degree that are significantly different from that of random networks.This implied the presence of significant spatially coherent structures in forecast error data.Such coherent structures indicate the existence of underlying systematic or structured random processes that give rise to a relationship between errors.Our analysis performed on the Asia-Pacific region for the JJA period revealed the dominant errors in the medium-range forecasts of the climate variables.In particular, from the wind forecast error networks, we found that the southern boundary of the western North Pacific subtropical high exhibits high correlated errors, whereas the Z850 error network revealed errors in the representation of the tropical circulation.We found that the high-degree regions in the OLR error network coincide with the ascending/descending limbs of the Walker circulation during the neutral ENSO years, whereas the spatial connectivity gets strongly reduced during warm and cold phases of the ENSO.This suggested that errors in OLR forecast errors might be associated with cloud biases in the ITCZ.In contrast to the wind and geopotential error networks, which exhibited a large structure of correlated errors, the OLR error network reveals several distant regions of high degree connected to each other.The distribution of the geographical distances corresponding to the links of the error networks are mostly heavy tailed, indicating the presence of a significant number of long-distance connections between errors of different regions, which is an important result.
In conclusion, our work demonstrates that the analysis of the spatial coherent patterns of forecast errors can provide an initial understanding of the origin of the errors.Such a study specifically designed to investigate the spatial properties of forecast errors has not been conducted previously, to the best of our knowledge.The framework of error networks introduced in this article can be a very promising diagnostic tool in this direction.As described earlier, the use of one of the basic network measures, the degree, already provides clear insights into the sources of the forecast error in different climate variables.It may be speculated that further understanding of the spatial properties of errors can be obtained using several other complex network measures (not shown here).In the following, we outline some network measures that can be used to derive other quantities of interest from the error correlation matrix.Mean geographical link distance (Supporting Information Figures S4 and S7), which calculates the mean of the spatial great-circle distances of a node to all its connected neighbours (Malik et al., 2012;Boers et al., 2013;Gupta et al., 2021), can be used to obtain an estimate of correlation length scale associated with each grid point (Ingleby, 2001;Pereira and Berre, 2006).The triangular or hexagonal structure of error correlations can be studied using measures of cliques and clustering in networks (Donges et al., 2009;Newman, 2010;Gupta et al., 2021).Diagnosis of the main direction and intensity of the local correlation anisotropies is interesting to evaluate the properties of heterogeneous covariance formulations (Pereira and Berre, 2006).This can be studied using network measures characterizing the spatial directedness of connections, such as edge anisotropy (Molkenthin et al., 2017) and edge directionality (Rheinwalt et al., 2016).It must be mentioned here that these network measures may be affected to varying levels by the artificial boundaries introduced to conduct a regional analysis, because of which a correction procedure to remove boundary effects should be applied (Rheinwalt et al., 2012).However, it is more appropriate to compute measures such as the mean geographical link distance and edge anisotropy for a global network to get a correct estimation, in which case boundary correction is not necessary.
Additionally, one can study vertical correlations in a similar fashion to the horizontal correlations shown here.The method can also be extended to analyse multivariate correlations using the concept of a network of networks (Donges et al., 2011) instead of univariate correlations as performed here.A future scope of the work is to extend the analysis to the whole globe and to conduct a detailed investigation on the seasonal dependence of the coherent error structures, as well as their dependence on the ENSO forcing.Furthermore, it is instructive to conduct this analysis for longer lead times of forecast; that is, subseasonal time-scales.

F
I G U R E 3 (a-d) Spatial patterns of network measure degree for networks constructed from mean absolute model bias corrected 5-day forecast error data for the June-July-August season of the climate variables (a) zonal wind component at 850 hPa (U850), (b) meridional wind component at 850 hPa (V850), (c) geopotential at 850 hPa (Z850), and (d) outgoing long-wave radiation (OLR).(e-h) Same (a)-(d), but for networks constructed from fifth-generation European Centre for Medium-Range Weather Forecasts Atmospheric Reanalysis data for the JJA season of the climate variables: (e) U850, (f) V850, (g) Z850, and (h) OLR.[Colour figure can be viewed at wileyonlinelibrary.com]F I G U R E 4 Degree distribution comparison of forecast error network (FE) with Erdös-Rényi (ER) network and spatially embedded random network (SERN) for (a) zonal wind component at 850 hPa (U850), (b) meridional wind component at 850 hPa (V850), (c) geopotential at 850 hPa (Z850), and (d) outgoing long-wave radiation (OLR).[Colour figure can be viewed at wileyonlinelibrary.com] et al.

F
Partial degree associated with the regions in yellow boxes showing the areas connected to those regions in the (a) zonal wind component at 850 hPa (U850), (b) meridional wind component at 850 hPa (V850), (c) geopotential at 850 hPa (Z850) forecast error networks shown in Figures 3a-c respectively.[Colour figure can be viewed at wileyonlinelibrary.com] et al., 2016;

F
I G U R E 6 (a, b) Partial degree associated with the regions in yellow boxes showing the areas connected to those regions in the outgoing long-wave radiation (OLR) forecast error network shown in Figure 3d.(c, d) Longitude versus vertical velocity plots at different pressure levels of fifth-generation European Centre for Medium-Range Weather Forecasts Atmospheric Reanalysis June-July-August climatology of vertical velocity (Pa⋅s −1 ) for El Niño-Southern Oscillation neutral years between 1980 and 2020 for the longitudinal range of 35 • E-120 • W, averaged over (c) 10 • N-17 • N and (d) 0 • -10 • S. Negative values indicate upward motion (ascent), whereas positive values indicate downward motion (sinking air).The regions of high degree in (a) and (b) coincide with regions with vertical velocity where vertical velocity changes sign, denoted by corresponding black dashed lines in the upper and lower rows, and shaded grey boxes in (c) and (d).[Colour figure can be viewed at wileyonlinelibrary.com] Common component function (CCF)values between forecast error (FE), reanalysis (R) and forecast (F) networks of zonal wind component at 850 hPa (U850), meridional wind component at 850 hPa (V850), geopotential at 850 hPa, and outgoing long-wave radiation (OLR).