Statistical Interdependence between Daily Precipitation and Extreme Daily Temperature in Regions of Mexico and Colombia

We study the statistical interdependence between daily precipitation and daily extreme temperature for regions of Mexico (14 climatic stations, period 1960–2020) and Colombia (7 climatic stations, period 1973–2020) using linear (cross-correlation and coherence) and nonlinear (global phase synchronization index, mutual information, and cross-sample entropy) synchronization metrics. The information shared between these variables is relevant and exhibits changes when comparing regions with different climatic conditions. We show that precipitation and temperature records from La Mojana are characterized by high persistence, while data from Mexico City exhibit lower persistence (less memory). We find that the information exchange and the level of coupling between the precipitation and temperature are higher for the case of the La Mojana region (Colombia) compared to Mexico City (Mexico), revealing that regions where seasonal changes are almost null and with low temperature gradients (less local variability) tend to display higher synchrony compared to regions where seasonal changes are very pronounced. The interdependence characterization between precipitation and temperature represents a robust option to characterize and analyze the collective dynamics of the system, applicable in climate change studies, as well as in changes not easily identifiable in future scenarios.


Introduction
Statistical interdependency can quantify interactions between systems' elements when they evolve synchronously [1][2][3].It focuses on assertively quantifying the coupling responsible for collective behavior.One of the most fruitful approaches to understand this phenomenon is Kuramoto's pioneering study in the 1970s on the phase synchronization analysis of coupled oscillators [4].A number of studies have applied these notions to synchronization analysis between irregular signals, identifying different coupling levels in several fields, including physical and biological systems [5][6][7].However, despite its usefulness in studying different systems [8], the complex nature of systems has given rise to mathematical complications in this task [9].
On the other hand, climate is a complex system whose behavior requires an integrated approach to describe its dynamics [36,37] and especially the characterization of coupling levels between representative variables.In past years, a great variety of coupling measures have been applied in the context of the climate study.For instance, Duane [38] studied meteorological teleconnections, using synchronized chaos, and reported the tendency of two hemispheric subsystems that simultaneously occupy the same regime.Berg et al. [39] analyzed seasonal characteristics of the relationship between daily precipitation intensity and surface temperature in Europe, distinguishing separate precipitation types and stating the dependence between temperature and precipitation.Donges et al. [40] compared measures to analyze climatic teleconnections using a complex network approach.Feliks et al. [41] studied the synchronization between the North Atlantic Oscillation and Oscillatory Climate Modes in the Eastern Mediterranean, identifying a significant synchronization.On the other hand, Gennaretti et al. [42] used the correlation coefficient to evaluate the interdependence of average temperature and precipitation for Canadian Arctic coastal zones, highlighting the importance of including interdependence analysis on climate change scenarios, and Jajcay et al. [43] analyzed the causality and synchronization of the El Niño Southern Oscillation, ENSO, and stated that the understanding of founded discrepancies may be the key to improving the ENSO prediction.
Two of the most important and representative climatic variables are precipitation and temperature because they play a key role in the hydrological behavior of a territory with an impact on events such as floods and droughts, among others [44][45][46].These variables (as physical phenomena) exchange nontrivial information in their (joint) evolution and are indispensable in the climate characterization.Quantifying the coupling level between climatic variables such as precipitation and temperature represents valuable information to robustly characterize their collective behavior, which is relevant in studies of climate change scenarios.However, as mentioned above, the description and characterization of climate variables have mainly focused on analyzing teleconnections and seasonal relationships.Nonetheless, there is a gap in the interdependence study to quantify shared information between climate variables such as precipitation and daily extreme temperature using robust techniques of synchronization measures, which is covered in this paper.In this work, the interdependence between precipitation and extreme daily temperature (maximum and minimum) is studied by measuring their synchronization level.We start with a statistical description of the time series by exploratory data analysis.The initial approach to the synchronization is studied using the cross-correlation and coherence functions, whereas the deeper analysis is carried out using the mutual information, the global phase synchronization index and the cross-sample entropy.
The remainder of this paper is outlined as follows: Section 2 contains the material and methods, which describe the study area, data, and data treatment for applying the techniques of synchronization measures.Section 3 presents the results and discussions of the obtained values from the applied techniques and their dissertation.Finally, Section 4 includes the conclusions.

Study Area and Data
We studied climatic data from two regions.The first one is the metropolitan area of Mexico City (Mexico), one of the most populated cities in the world, where urban expansion has introduced modifications in the atmospheric energy exchange [47].Daily records of precipitation, and maximum and minimum temperatures of 14 climatic stations from 1960 to 2020 were studied, i.e., 42 time series each with about 20,000 records obtained from Servicio Meteorológico Nacional (SMN) of the Comisión Nacional del Agua (CONAGUA, https://smn.conagua.gob.mx/es/climatologia/informacion-climatologica/informacion-estadistica-climatologica, last accessed date: 8 May 2024).The second region refers to La Mojana (Colombia), which serves as a hydraulic damping system for the Cauca, San Jorge, and Brazo Loba (a bifurcation of Magdalena River) rivers that convert it in a great interest area due to its natural diversity, hydrological and hydraulic functions, and agricultural importance with particular climate characteristics and social dynamic [48,49].For this region, the precipitation and maximum temperature daily records of seven (07) climatic stations between 1973 and 2020 were studied, i.e., 14 time series of about 13,500 records each, obtained upon request to the Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM, http://dhime.ideam.gov.co/atencionciudadano/, last accessed date: 8 May 2024).In total, 56 time series were studied.The records period for each region was taken according to the data availability.Because of the information's lack of minimum temperature, it was not possible to study this variable for La Mojana.The general relevant information on climatic stations is shown in Table 1.To visualize the study area, see detailed online information on the geographical location of the stations at this link: https://colab.research.google.com/drive/1ZWVi9hpvi_Q3ZeR4BOhgTat3sR7l_kbN?usp=sharing, last accessed date: 8 May 2024.

Exploratory and Fractal Data Analysis
This aspect was addressed through descriptive statistics, involving position and central tendency measures and dispersion measures, among other statistics measures, following [50][51][52][53].The missing data were input using reanalysis data obtained from ERA database (https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels? tab=form, last accessed date: 8 May 2024).Also, through visual inspection, we identified outliers, and if any existed, we compared them with nearby stations searching for similar records at the occurrence date.If such an event was an extreme one, we validated it, and we replaced it with the ERA register otherwise.In addition, to characterize the temporal organization of the individual (univariate) series, persistence and fractality were analyzed using rescaled range analysis [54] and Higuchi's fractal dimension [55].Details of the procedures for calculating the Hurst exponent (H) and the Higuchi fractal dimension (D) can be found in [56] and [55,57], respectively.Values of 0.5 < H ≤ 1.0 indicate persistence (long-term memory), while 0.0 ≤ H < 0.5 indicates anti-persistence, and if H = 0.5, the fluctuations are neither persistent nor anti-persistent.Similarly, signals with D < 1.5 exhibit long-range correlations, while D > 1.5 indicates anti-correlations.There is a direct relationship between H and D that is applicable to self-affine series given by H = 2 − D, where 1 < D < 2 [58].

Synchronization Measures
Let P, T max , and T min be the precipitation, and maximum and minimum temperatures, respectively.We compute the following measures.

Cross-Correlation Function
The cross-correlation f unction c P,T (τ) between P and T (T max or T min as appropriate) gives a linear synchronization measure between P and T at a lag τ, expressed as [59,60]: where N is the time series size, and P and T represent the mean values.s P and s T denote the standard deviation of P and T, respectively.

Coherence Function
The coherence f unction Γ P,T ( f ) gives a linear synchronization measure in the frequency domain, involving the Fourier transform of the cross-correlation function of P and T, modulated with its self-spectral [61,62], that is: where τ)e j2π f τ dτ is the crossed spectral of P and T, and c P,T (τ) is the mathematical expectation cross-correlation function.G P,P ( f ) and G T,T ( f ) are the self-spectrals of the mathematical expectation of the autocorrelation function of P and T, respectively.

Mutual Information
Mutual in f ormation MI(P, T) is an entropy-based measure that quantifies the information amount shared between the random variables P and T with marginal distributions p(P), p(T) and joint distribution p(P, T) computed as [63][64][65]: The MI(P, T) also gives a stable measure of the information flow of the variables in terms of its synchronization.

Global-Phase Synchronization Index Using Hilbert Transform
This measure is based on analyzing the instantaneous phases ∆ϕ P,T (t) of the signals P and T, whose remarkable characteristics are the signal phase analysis, irrespective of their frequency and nonparametric condition, and are defined as [5,15,24,62]: where S• (t) is the Hilbert transform of the signal (P(t) or T(t) as appropriate), and PV is Cauchy's principal value.

Cross-Sample Entropy
Cross − sample entropy, here denoted as CSE, is an entropy-based asynchrony mea- surement that compares the similarity between two time series.CSE depends on three parameters: m is the model vector's length, r is the distance tolerance, and N is the time series size.To compute CSE, we proceed as follows [13,14,17]: given the time series (signals) u(t) = P(1), P(2), • • • , P(N) and v(t) = T(1), T(2), • • • , T(N) (just P, T max or T min as appropriate), we compute B m (r Finally, the CSE is defined as: CSE is zero when the time series are perfectly synchronized, whereas higher values of CSE indicate asynchrony.

Statistical Significance Test for the Synchronization Metrics
To investigate differences in the metrics obtained between the two regions, using the Scipy stats module (https://docs.scipy.org/doc/scipy/reference/stats.html, last accessed date: 8 May 2024), we computed the t-Student test, which is a statistic test that, with a defined significance level (or its equivalent confidence level), compares if two independent samples are similar regarding their mean values [66], and the Mann-Whitney test [67], a nonparametric statistic test that compares if two independent variables are dissimilar.
The set of coupling measures described above allows us to quantify the synchronization degree between precipitation and temperature covering both linear (cross-correlation and coherency function) and nonlinear (mutual information, phase synchronization, and cross-sample entropy) information aspects by studying them in the time (cross-correlation function and entropy-based measures), frequency (coherence function), and phase (phase synchronization index) domains.These metrics provide us with valuable information on the joint evolution to characterize and analyze the relationship between these climatic variables further than conventional statistical analysis.All data processing and metrics computations were carried out in Python (https://www.python.org/,last accessed date: 8 May 2024) language, using libraries such as Numpy (https://numpy.org/,last accessed date: 8 May 2024), Scipy (https://scipy.org/,last accessed date: 8 May 2024), EntropyHub (https://www.entropyhub.xyz/,last accessed date: 8 May 2024) and Matplotlib (https://matplotlib.org/, last accessed date: 8 May 2024) for graphical visualization.The results are described below.The data structure was studied through its persistence and fractality.The persistence was analyzed using Hurst exponent H obtained with the rescaled range method, which indicates the presence of long-term correlations among the records.For all variables under study and both regions, Mexico City and La Mojana, Hurst values fall within the interval 0.5 < H ≤ 1.0 (see Table 2).For precipitation, the magnitude of certain rainfall events has a long-term relationship, and the same applies for the temperature as well.For Mexico City, it is observed that H T min > H T max > H P and for La Mojana, H T max > H P .These results indicate that these climatic variables have different levels of long-term correlations, i.e., "process memory".Moreover, these results agree with other studies in terms of the persistence values for the scaling indexes in different climate analysis [56, [68][69][70][71][72][73][74][75].c-e) correspond to the minimum temperature, precipitation and maximum temperature for Mexico City, respectively.In general, the temperature exhibits more dispersion in Mexico City than in La Mojana, whereas for precipitation, it shows a similar behavior.In all variables, higher event values are observed for La Mojana, which shows important differences in the fluctuations between the two climatic regions.

Exploratory Data Analysis
On the other hand, Table 3 shows the results of the Higuchi fractal dimension (D).We find that, in general, D P > D T max > D T min , preserving the same hierarchy in the irregularity of the structure in the variables from both regions.When the fractal dimension associated with precipitation is compared between regions, we observe that, in most cases, the one corresponding to Mexico City is larger than the one corresponding to La Mojana, confirming that there is a greater irregularity in the former.Thus, precipitation tends to be a very irregular phenomenon and therefore difficult to predict, while the relative regularity of temperature makes it somewhat more predictable.In addition, the results shown in Table 3 are consistent with the long-term self-correlations presented in Table 2 for the Hurst exponent, and the values satisfy the known H = 2 − D relationship.As a general approach of linear correspondence, the global Pearson correlation (which, roughly speaking, is a linear correspondence relationship between two independent variables) between the precipitation and temperature (maximum T max and minimum T min as appropriate) is computed for all the variables for each region (Mexico City an La Mojana), and the results are illustrated in Figure 3.It can be seen from the correlation matrix in Figure 3 that there is a high global relationship between the variables as the climatic zone correspondence.
In general, according to Figure 3, La Mojana exhibits higher values of Pearson correlations than Mexico City.This effect is possibly due to the higher relative stability of climatic variables in La Mojana, which has a stretched interval of occurrence values compared with those from Mexico City.

Synchronization Measures
To reduce the effects of spurious correlations that can affect the applied techniques and lead to misleading results, we normalized the time series before computing the synchronization measures by extracting its mean and dividing by the standard deviation such that the time series are normalized to have zero mean and unitary variance.

Cross-Correlation Function
After exploring the Pearson correlation comparing all the variables between them for each regions, we evaluated the linear synchronization as time dependence through cross-correlation involving the variables (precipitation and temperature) in the same station.The results of the calculations are shown in Table 4.For Mexico City, the highest values of cross-correlation between P and T occur at lag τ = 0 (with global average c(P, T max ) = 0.228 ± 0.064 between precipitation and maximum temperature, and c(P, T min ) = 0.096 ± 0.031 between precipitation and minimum temperature), meaning that once a rainfall event occurred, the closest-related temperature event occurred on the same day.In contrast, for La Mojana, the highest values occur at lag τ = 1 (with global mean value of c(P, T max ) = 0.256 ± 0.044 between precipitation and maximum temperature), i.e., once a precipitation event has occurred, the temperature with which it is most closely related occurred on the last day.This result is reasonable when considering the variability of the magnitude of the events in the different regions, being more stable in La Mojana.

Coherence Function
The coherence function shows several bands of high synchronization at different frequencies for Mexico City (See Figure 4a and Figure 4b corresponding to P vs. T max and P vs. T min , respectively), while La Mojana (Figure 4c for P vs. T max ) has only one frequency band where the synchronization is high.It is reasonable to attribute this behavior to the seasonality effect for Mexico City, i.e., the coherence values are related to its seasonal condition, giving several bands of synchronization in terms of their frequencies.Indeed, because of its lack of seasonality, La Mojana exhibits only one frequency band, suggesting that using specific frequency bands to analyze climate records will lead to a better characterization of climate records.In general, for Mexico City, the global average coherence between P and T max is 0.061 ± 0.018 (average ± standard deviation), while between P and T min , it is slightly higher with a mean value of 0.064 ± 0.026.On the other hand, for La Mojana, the average coherence between P and T max has a global mean value of 0.088 ± 0.017.Regardless of the seasonality effect, note that La Mojana exhibits higher global average coherence than Mexico City, meaning more synchronization of the analyzed variables for the former.

Mutual Information
As shown in Table 5, for Mexico City, MI has greater values for P and T max than P and T min .The average values are the following: MI(P, T max ) = 1.14 ± 0.29 and MI(P, T min ) = 1.08 ± 0.26 (clearly MI(P, T max ) > MI(P, T min )).These results indicate that precipitation shares more information with the maximum temperature than with the minimum one.For La Mojana, the mean value is MI(P, T max ) = 1.67 ± 0.20, and, in general, MI exhibits higher values compared to those observed in Mexico City, confirming that both variables share more information for this region.The calculations of the γ-index are shown in Table 6 for both regions.For Mexico City data, similar γ-values are observed when they come from either P and T max or P and T min .We find that La Mojana leads to higher vales compared to Mexico City.In general, according to Table 6, values of γ P,T are above 0.72 for Mexico City, whereas for La Mojana, the values are above 0.92.As a synchronic measure, CSE values close to zero mean synchrony, while higher values mean asynchrony.Table 7 shows the obtained results for this measure.In addition, to ensure that the information obtained by this metric comes from the behavior of the time series and not from spurious correlations, we also calculate the CSE for the random (shuffled) version of the time series.
The average CSE E between precipitation and maximum temperature for Mexico City is 1.059 ± 0.276, whereas the average CSE R for randomized time series is 3.491 ± 0.592.There is a similar occurrence between the precipitation and minimum temperature (for Mexico City), where the average values satisfy CSE R > CSE E .For La Mojana, the average CSE E between precipitation and maximum temperature is 0.960 ± 0.404, and CSE R has a value of 3.409 ± 0.327.In general, the average CSE E from Mexico City is (about 10%) greater compared with the values obtained from La Mojana, which is in agreement with the results obtained with all previously explored metrics, i.e., a higher synchronization is observed in the latter region.

Statistical Significance Test for the Synchronization Metrics
To distinguish if the synchronization measures are different between the two studied regions, we test the statistical significance of our results for the coupling measures involving precipitation and maximum temperature using t-Student and Mann-Whitney tests, stating as a null hypothesis that, with 95% confidence level, the metrics are the same for both regions.The results are presented in Table 8.Note from Table 8 that, except for cross-correlation, the explored metrics are different between the regions (p-value << 0.05) for both t-student and Mann-Whitney tests.The cross-correlation is able to measure relationships between two random variables that follow a linear behavior; however, as they do not show to be different between the two regions, this is most likely due to the nonlinearity of the variables studied, which can be considered a manifestation of the higher complexity that characterizes their joint evolution.

Discussion and Conclusions
We have presented a study, based on linear and nonlinear synchronization measures, to identify the level of coupling between daily precipitation and extreme daily temperature records of climate stations from two regions.We find that the degree of coupling is approximately similar for stations in the same region, while when comparing the two regions analyzed, which have dissimilar climatic characteristics, there is a significant difference (at a confidence level of 95%) in the degree of coupling.
The information presented is consistent and is in agreement with those reported in the literature regarding spatial behavior and complexity [38][39][40][41][42][76][77][78][79].There is evidence that, in a climatic station, precipitation and daily extreme temperatures share information on its dynamics.At first, the results obtained confirm that precipitation data from the two regions exhibit a persistent behavior and temperature records display even more persistent features.When comparing the records from both regions, it is observed that the persistence is greater in the case of La Mojana, indicating that both precipitation and temperature from Mexico City display higher variability that resembles more erratic variations (less memory).
The global information shared between these variables is evidenced by metrics such as those used in this work; however, due to the nonlinear nature of these relationships, it was found that linear metrics such as cross correlation and coherence do not measure interdependence in a robust way, although they provide some characteristics that allow making analysis decisions such as the selection of bands for the detailed study of the time series in the frequency domain as evidenced by coherence.It was also corroborated that metrics such as mutual information quantify the flow of information between the variables studied, being very significant in this case.Our results show different levels of interdependence between precipitation and temperature, demonstrating that these intensities in the associations between the variables depend strongly on the geographic region and local effects that significantly impact the dynamics of these climatic variables.Particularly, our results have indicated that data from Mexico City exhibit a lower synchrony compared to data from La Mojana.This has been verified in the five metrics used to characterize the interdependence between the signals.
The differences in the level of coupling between the regions studied can be explained in the context of greater variability in the case of Mexico City, where seasonality is very important, while in the Mojana area, this component is almost absent.Further studies that include a number of regions with diverse local conditions are needed to better characterize the zones by the levels of coupling achieved and to determine general patterns that will help to better understand these climatic variables.
Future directions for this type of studies could include the identification of possible precursor patterns of extreme values in the variables that could be linked to a greater coupling between the signals or to a lower synchrony, as well as causality, including the use of strategies of topological data analysis to study the synchronization phenomena as used in [80,81].In summary, the interdependence between temperature and precipitation is of vital importance for a better understanding of climate dynamics, with implications ranging from the environmental impacts already evidenced by climate change to economic and social consequences.Data Availability Statement: Data used in this investigation were obtained from Servicio Meteorológico Nacional of the Comisión Nacional del Agua (CONAGUA, https://smn.conagua.gob.mx/es/climatologia/informacion-climatologica/informacion-estadistica-climatologica, last accessed date: 8 May 2024) and Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM, http://dhime.ideam.gov.co/atencionciudadano/, last accessed date: 8 May 2024), were they are public available and can be downloaded under the station code.

Figure 1
Figure 1 illustrates representative time series under analysis for both regions.For the analyzed period in Mexico City (1960-2020), the maximum temperature recorded values between 3.5 • C and 38.5 • C, with a mean value of 23.3 • C, the minimum temperature registered values ranging from −10.5 °C to 26.0 • C with a mean value of 8.3 °C, while the precipitation presented the maximum value of 117 mm in 24 h.The maximum temperature exhibits dispersion below the first quartile and above the third quartile, showing higher Figure 2 shows the boxplot of the analyzed climatic variables.

Figure 1 .
Figure 1.Representative time series of (a) precipitation P and (b) maximum and minimum temperatures T (T max , red line; T min , blue line, respectively) for Mexico City (station MCS13).(c) Precipitation P and (d) maximum temperature T (T max ) for La Mojana (station LMS4).

Figure 2 .
Figure 2. Boxplot of raw data.(a,b) correspond to the precipitation and maximum temperature, respectively, for La Mojana.(c-e)correspond to the minimum temperature, precipitation and maximum temperature for Mexico City, respectively.In general, the temperature exhibits more dispersion in Mexico City than in La Mojana, whereas for precipitation, it shows a similar behavior.In all variables, higher event values are observed for La Mojana, which shows important differences in the fluctuations between the two climatic regions.

Figure 3 .
Figure 3. Global Pearson correlation coefficient matrix between precipitation and maximum temperature of empirical data in (a) Mexico City and (b) La Mojana.

Figure 4 .
Figure 4. Coherence function heatmap of (a) P vs. T max , (b) P vs. T min for Mexico City and (c) P vs. T max for La Mojana.

Table 1 .
General information of climatic stations.

Table 2 .
Hurst exponent values H for P (H P ), T max (H T max ) and T min (H T min ).

Table 3 .
Higuchi fractal dimension D values for P, T max and T min .

Table 4 .
Cross-correlation values c(P, T) between precipitation P and temperature T (T max or T min as appropriate) for Mexico City and La Mojana.

Table 5 .
Mutual information MI(P, T) for P and T (T max and T min ) in Mexico City and La Mojana.

Table 6 .
Phase synchronization index γ(P, T) using Hilbert transform between precipitation P and temperature T (T max or T min ) for Mexico City and La Mojana.

Table 7 .
Cross-sample entropy CSE (CSE E for experimental and CSE R for randomized time series) for P and T (T max and T min as appropriate) in Mexico City and La Mojana, setting m = 6 and r = 0.20.

Table 8 .
List of p-values to compare the statistical differences between the synchronization measures from the two regions (Mexico City and La Mojana), for precipitation and maximum temperature at a confidence level of 95%.