Auto-Machine-Learning Models for Standardized Precipitation Index Prediction in North–Central Mexico

: Certain impacts of climate change could potentially be linked to alterations in rainfall patterns, including shifts in rainfall intensity or drought occurrences. Hence, predicting droughts can provide valuable assistance in mitigating the detrimental consequences associated with water scarcity, particularly in agricultural areas or densely populated urban regions. Employing predictive models to calculate drought indices can be a useful method for the effective characterization of drought conditions. This study applied an Auto-Machine-Learning approach to deploy Artificial Neural Network models, aiming to predict the Standardized Precipitation Index in four regions of Zacatecas, Mexico. Climatological time-series data spanning from 1979 to 2020 were utilized as predictive variables. The best models were found using performance metrics that yielded a Mean Squared Error, Mean Absolute Error, and Coefficient of Determination ranging from 0.0296 to 0.0388, 0.1214 to 0.1355, and 0.9342 to 0.9584, respectively, for the regions under study. As a result, the Auto-Machine-Learning approach successfully developed and tested Artificial Neural Network models that exhibited notable predictive capabilities when estimating the monthly Standardized Precipitation Index within the study region.


Introduction
In the context of climate change, examining alterations in rainfall patterns is a crucial area of research because human activities are highly susceptible to extreme weather events such as excessive or insufficient rainfall [1,2].Meteorological drought occurs when the measured rainfall amount falls short of the long-term average [3].Extended periods of drought can directly reduce freshwater flows, prompting adjustments in the management and planning of hydraulic resources, especially in areas vulnerable to water scarcity like agricultural farmlands or densely populated urban regions [4].
To assess droughts, various methods have been established, with drought indices being widely used.Among drought indices, the Standardized Precipitation Index (SPI) is used as a means of classifying measured precipitation relative to a probability distribution function for rainfall [5].This index was developed to assess the deviation of observed precipitation from the expected distribution.This index allows us to classify climatic regions and is applied as a drought indicator, enabling comparisons to be made over different periods and locations [6].The simplicity of calculating this drought index is among its advantages, as it relies solely on rainfall time-series data [7].As an example of its applicability, the SPI was employed to establish consistent precipitation zones across Mexico [8].
Despite the potential to establish smaller areas or zones using this index, the SPI has been used to group monthly time-series data from Zacatecas state in Mexico into clusters (i.e., regions) with similar drought patterns [9].Their goal was to calculate regional SPI values and to estimate SPI trends within those regions.Based on current knowledge of SPI trends, there has been less precipitation in Zacatecas state than the historical average [9].However, forecasts for the SPI in the near future remains unknown.The knowledge of this information holds great importance for inhabitants, as it enables them to modify their actions in accordance with planned adaptation strategies, specifically in relation to water scarcity.
Along with assessing droughts, artificial intelligence (AI) has been utilized to develop models for predicting them, demonstrating effectiveness and accuracy in this area.Recently, machine learning methods (which are a type of AI) have become more proficient, precise, and user-friendly, making them particularly useful for analyzing hydrological data [10,11].Neural networks, which are algorithms that learn from data, have been successfully used to model and predict nonlinear time series in various fields, including water resources and hydrology [4,11].Consequently, Artificial Neural Network (ANN) models have been employed as a valuable data-driven tool for forecasting the monthly SPI [3,4,10,[12][13][14][15][16][17].
In summary, the SPI has been used in several worldwide regions for assessing and forecasting droughts.However, its use in Mexico for forecasting, specifically with neural networks, remains to be explored.Moreover, one of the main problems in the use of artificial neural networks is that the selection of features and models, as well as the tuning of their hyperparameters, is a complex and time-consuming task.To address these issues, the use of the Auto-Machine-Learning (AutoML) approach emerges as a viable alternative, as it enables the construction and validation of machine learning pipelines with minimal user intervention [18].
In this research, an AutoML approach was applied to develop and deploy artificial neural network models with the aim of predicting the regional Standardized Precipitation Index.The models utilized meteorological datasets as predictive factors spanning from 1979 to 2020, alongside a climate index.The objectives of the research were as follows: (a) to employ an AutoML approach for implementing artificial neural network models, (b) to apply the implemented models for predicting the regional Standardized Precipitation Index, (c) to assess the performance of the models by employing performance metrics, and (d) to analyze the prediction errors of the models during the validation period.

Data
In order to train the ANN models, a set of 6 input or predictor variables accompanied by a covariate were employed.These variables were used to depict the climatic and geographic attributes of weather stations established within the region of Zacatecas state in Mexico (Figure 1).The input or predictor variables were specific to each site and included the station ID, date, rainfall (PP), evaporation (EVP), maximum temperature (TMAX), minimum temperature (TMIN), and mean temperature (TMED).The evapotranspiration predictor (PET) was assessed using the Thornthwaite method [19].The Multivariate El Niño Southern Oscillation Index v.2 (MEI) was later incorporated as a regression covariate during the training process of the ANN models.
Because the MEI database only has records dating back to 1979, this study exclusively considered weather stations with complete records from that year onwards as predictors or variables of interest.Therefore, a total of 24 weather stations were chosen for this study, with records spanning from 1979 to 2020.The input variables were acquired from a longterm meteorological dataset provided by the Mexican 'Comisión Nacional del Agua'.Prior to any processing, the database underwent scrutiny to ensure the absence of any abnormal or missing data.

Standardized Precipitation Index
The Standardized Precipitation Index (SPI) [5] is a well-established tool used to measure the severity of precipitation anomalies over different time scales.To monitor and evaluate the prevailing drought conditions, the SPI is extensively employed.The SPI uses only precipitation data to calculate a standardized value that represents the deviation of the current precipitation from the long-term average for a given location and time period.The computation of the standardized value involves dividing the deviation between the current precipitation and the long-term average by the standard deviation of the long-term precipitation.The final SPI result is a value that is expressed in units of standard deviations from the long-term mean.
The computed SPI values could be classified into categories based on their magnitude, with negative values indicating drier than average conditions and positive values indicating wetter than average conditions [5].Table 1 displays the categorization of SPI values, ranging from "extremely drought" to "extremely wet", as well as intermediate categories indicating moderate to severe drought or wet conditions.It is worth mentioning that the SPI can be calculated using different time scales, ranging from a few months to several years, depending on the needs of the user or application.Smaller time scales prove to be beneficial in monitoring drought conditions of shorter duration, whereas larger time scales can capture long-term alterations in rainfall patterns.As highlighted by [20], the 3-month SPI value characterizes moisture conditions over short to medium terms, the 6-month SPI value indicates agricultural droughts, and the 12-month SPI value corresponds to droughts impacting water supply reservoir levels.In this study, we specifically calculated the SPI values using a 12-month timeframe.
Due to the labor-intensive nature of manually calculating SPI values, several computer programs have been developed to streamline the process and increase accessibility.In our research, we used the SPEI15 package 1.8.1 [21] within the R system version 4.3.1 [22] for SPI computation.

Cluster Analysis
Cluster analysis is a statistical technique used to categorize elements or variables by grouping them together based on their similarities.The primary objective is to maximize the similarity within each group, ensuring homogeneity, while simultaneously maximizing the dissimilarities between groups [23].The application of this technique as a statistical tool has gained extensive usage in delineating homogeneous climatic regions by utilizing observed values of meteorological variables, as demonstrated in previous studies [23,24].
In this study, a tree clustering algorithm was applied to cluster the entire set of 24 monthly SPI time series, which corresponded to 24 weather stations.The purpose was to group these stations into regions that exhibited similar SPI (i.e., pp regime) values.Through the application of the clustering technique, based on the observed similarity in their SPI values (i.e., pp regime), the analysis led to the identification of four unique regions: Semidesert region (Pinos and Villa García), Highlands region (Calera, Cuahutemoc, El Cazadero, Fresnillo, Jerez, Jiménez, Loreto, Ojocaliente, Santa Rosa, Villa de Cos, and Zacatecas), Mountains region (El Chique, El platanito, El Sáuz, La Florida, Monte Escobedo, and Villanueva), and Canyons region (Excamé, Gruñidora, Juchipila, Téul, and Tlaltenango).In this study, we used two R packages, hclust [22] and ape [25], for the cluster computation under the R system 4.3.1 [22].

Potential Evapotranspiration Index
The PET represents the maximum amount of water that could evaporate from a vegetation-covered surface if unlimited water were available.This includes the combined water loss from both evaporation and transpiration within a specific crop or ecosystem [26].
In this study, with the availability of solely monthly rainfall and temperature data, the widely adopted Thornthwaite PET method [19], was used following the guidelines established in [27].In this research, the SPEI package [21] was utilized to compute the PET index using the R system version 4.3.1 [22].The mentioned packages in this research and the R system can both be accessed through the Comprehensive R Archive Network https://www.cran.r-project.org/(accessed on 15 May 2024).

Multivariate ENSO Index Data
El Niño Southern Oscillation (ENSO) is a natural large-scale climatic phenomenon that affects weather worldwide, particularly rainfall patterns.It is characterized by fluctuating ocean temperatures in the central and eastern equatorial Pacific, accompanied by atmospheric changes above.The Multivariate ENSO Index (MEI) is the result of a process of standardizing six atmospheric and oceanic variables associated with ENSO and employing Principal Component Analysis to identify prevailing patterns of variability and decrease data dimensionality.The resulting Principal Components are weighted and combined to create a single index that represents the overall strength of ENSO [28,29].
The Multivariate ENSO Index Version 2 is computed by the National Oceanic and Atmospheric Administration.In this research, the MEI.v2 database spanning from 1979 to 2020 was used as a regression covariate for training the ANN models.The MEI.v2 database is available at http://www.esrl.noaa.gov(accessed on 15 May 2024).

Linear Models for Time-Series Forecasting
Linear models have been used as the standard approach for time-series forecasting.Despite the availability of newer methods, many researchers continue to rely on these models due to their simplicity in implementation and ability to produce accurate predictions.
The most-used linear models include and are not limited to the Linear Regression model, Auto-Regressive model, Moving-Average model, Auto-regressive and Moving-Average Model, and the Auto-Regressive Integrated Moving-Average model [30].
It is worth mentioning that the linear models discussed previously are commonly applied in modeling linear stochastic systems and are suitable for analyzing time-series data that exhibit stationarity.Nevertheless, this could also be seen as one of their main limitations.

Machine Learning for Time-Series Forecasting
Deep learning is a subfield of machine learning that uses neural networks having multiple hidden layers to identify and extract relevant features from data.Deep learning has become increasingly popular for time-series forecasting because it can learn features and patterns in the data that may be difficult for traditional statistical models to detect.
Time-series forecasting using deep learning models often incorporates Recurrent Neural Networks (RNNs) or their variants like the Gated Recurrent Unit (GRU) or Long Short-Term Memory (LSTM) models.These models can address inherent challenges associated with time-series data such as the temporal dependences in the data and can learn long-term patterns and trends.

Recurrent Neural Network
Recurrent Neural Networks (RNNs) are artificial neural networks knows for their effectiveness in handling sequential data, including time-series data [31].RNNs can remember previous inputs and use them to inform their predictions for future outputs.In brief, RNNs are ideal to capture the temporal complex dynamics of the time series.
In a basic RNN architecture, each time step in the time series corresponds to an input to the network.The RNN processes the input at each time step along with its internal state, producing an output and updating its state.Subsequently, the output obtained can be utilized to predict the next time step, and this iterative process continues.
One issue with basic RNNs is that they can suffer from vanishing gradients, which makes it difficult for the network to learn long-term dependencies.To overcome this obstacle, more sophisticated RNN architectures have been devised, including the Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM).

Long Short-Term Memory
Long Short-Term Memory (LSTM) was introduced to address the problem of vanishing gradients that can occur in traditional RNNs.LSTM is particularly well-suited for timeseries forecasting tasks [32].
Like other RNNs, LSTM was conceived with the aim to process sequential data, such as time-series data, by maintaining an internal state that is updated with each new input.Nevertheless, LSTM employs a more intricate internal structure compared to conventional RNNs, incorporating three gating mechanisms such as an input gate, forget gate, and output gate.
The input gate regulates the extent to which the new input is integrated into the present state, while the forget gate manages the degree to which the prior state is disregarded.Lastly, the output gate governs the proportion of the current state that should be emitted as the output.
Each gate is controlled by a sigmoid activation function that produces values between 0 and 1, allowing the network to selectively adjust the amount of information that is remembered or forgotten.
Alongside the gating mechanisms, LSTM incorporates a memory cell, enabling the network to retain information over long periods of time.The memory cell undergoes updates using information from the input gate, forget gate, and a candidate activation function.The candidate activation function calculates a new value for the memory cell by considering the previous memory cell value and the current input.
The final output of the LSTM at each time step is determined by a combination of the current hidden state and the memory cell.The hidden state is updated considering both the output gate and the candidate activation function.
In time-series forecasting tasks, the LSTM can be trained to future values prediction of a time series based solely on past observations.The network takes in a sequence of past observations and uses them to update its internal state.Subsequently, the ultimate hidden state and memory cell of the network are employed to forecast the subsequent value in the time series.

Gated Recurrent Unit
Gated Recurrent Unit (GRU) is an RNN architecture that demonstrates notable suitability for tasks involving time-series forecasting.GRU was proposed as a simpler and more efficient alternative to the LSTM architecture, which may be more computationally expensive [33].
Similar to other RNNs, GRU is designed to process sequential data, such as time-series data, by maintaining an internal state that is updated with each new input.However, unlike traditional RNNs, GRU uses gating mechanisms to selectively remember or forget information from previous time steps.
The basic GRU architecture includes the following two gates: the reset gate and the update gate.The reset gate controls how much of the previous state to forget, while the update gate controls how much of the new input to incorporate into the current state.The reset and update gates are controlled by sigmoid activation functions that produce values between 0 and 1, allowing the network to selectively adjust the amount of information that is remembered or forgotten.
Alongside to the reset and update gates, GRU also has a candidate activation function that calculates a new hidden state considering the previous state and the current input.The employed candidate activation function is a hyperbolic tangent function, generating values ranging from −1 to 1.
The output of the GRU at each time step is determined by a combination of the current hidden state and the input at that time step.The hidden state considers the reset and update gates, along with the candidate activation function.
In time-series forecasting, GRUs are capable of learning long-term dependencies between past observations and future values.The network takes in a sequence of past observations and uses them to update its internal state.Afterward, the final hidden state of the network is used to predict the next value in the time series.

Automated Machine Learning
Automated Machine Learning is a process of automating the selection of the best models and their hyperparameters for a given task.It is a time-saving and cost-effective method for developing high-performance machine learning models with less human intervention [34].AutoML for time-series prediction refers to the automated process of choosing the best models and hyperparameters for predicting future values of a time-series dataset.
Generally, the AutoML process for time-series prediction includes the following wellknown steps: 1.
Data preprocessing: This involves cleaning and preparing the time-series data for analysis, such as handling missing values, outliers, and converting the data into a suitable format for modeling.

2.
Feature engineering: This step involves extracting relevant features from the timeseries data to be used as input in the machine learning models.

3.
Model selection: In this step, prediction of the forthcoming values of the time series is achieved by evaluating and comparing different machine learning models for their performance.

4.
Hyperparameter tuning: This involves selecting the optimal values of hyperparameters for each machine learning model, which can significantly improve the model's performance.

5.
Ensemble learning: This step involves combining multiple machine learning models to improve the prediction accuracy of the time-series data.

AutoML Frameworks
AutoML for time-series prediction can be achieved using various platforms such as AutoGluon, AutoKeras, Auto-Pytorch, Auto-Sklearn, Auto-Weka, EvalML, H 2 O, TPOT, TransmogrifAI, TSPO, and many others [35].These platforms automate the entire machine learning pipeline, from data preprocessing to model selection and deployment, making it easier for non-experts to develop accurate time-series prediction models.
H 2 O AutoML is a machine learning platform and AutoML module that encompasses various algorithms, including Random Forests, Extremely Randomized Trees, Generalized Linear Models (GLM), XGBoost, Gradient Boosting Machines (GBM), and Deep Neural Networks.Furthermore, it uses automated target encoding for high-dimensional categorical variables as a preprocessing technique [18].
H 2 O trains a randomized grid of algorithms by exploring a hyperparameter space.The individual models undergo tuning through cross-validation.Subsequently, the following two stacked ensembles are trained: one consisting of all models optimized for superior performance, and the other comprising only the best-performing model from each algorithm.The outcome is a sorted leaderboard showcasing all of the models [18].
In this analysis, H 2 O AutoML was selected to deploy individual neural network models to the cluster procedure results of the following four distinct regional time-series datasets: Semi-desert, Highlands, Mountains, and Canyons.The models were constructed using predictors such as the rainfall (PP), evaporation (EVP), maximum temperature (TMAX), minimum temperature (TMIN), mean temperature (TMED), evapotranspiration (PET), and MEI of the respective datasets from each region with the aim to forecast the SPI index values (i.e., target variable) specific to each region.Since the model was designed to forecast the SPI values that had already been normalized, there was no need to normalize or standardize the data again.
The consolidated dataset used for training each regional neural network consisted of a matrix comprising 504 timesteps (i.e., months) and 7 predictors, along with a vector representing the response variable over the same 504 timesteps (i.e., months).Table 2 displays a summary of descriptive statistics for the input predictors used to train the models.When training multilayer networks, a common approach involves initially splitting the data into two distinct subsets.The initial subset is referred to as the training set and is utilized for computing the gradient as well as adjusting the weights and biases of the network.The second subset, known as the test set, is used to monitor the error throughout the training process.In the early stages of training, the test error usually decreases, mirroring the decline observed in the training set error.
The model architecture was constructed using the H 2 O AutoML platform, specifically version 3.40.0.2 [18], implemented with Python Language version 3.9.16[39].The training dataset for the model contained 80% of the available data in chronological order for each regional SPI time series (403 months spanning 1979 to 2007), while the remaining 20% was used for testing purposes (101 months spanning from 2007 to 2020).
A primary reason for using AutoML is its capability to automate the machine learning workflow.This includes automatically training and tuning the hyperparameters of different models, identifying a suitable model, and optimizing it [40].When using H 2 O AutoML, besides the train and test databases, the only required parameters to run it were the name or index of the response variable (SPI) and the training frame or predictor variables (PP, EVP, TMAX, TMIN, TMED, PET, and MEI).Additionally required stopping parameters were provided separately; in this case, the maximum runtime of the AutoML process.No additional hyperparameters were required to run the AutoML.Optionally, it is possible to fine-tune several miscellaneous parameters [40].The data flow processing is shown in Figure 2.

Performance Metrics
The assessment of the model's performance relied on the use of widely recognized metrics and loss functions, including the Mean Squared Error (MSE) and Mean Absolute Error (MAE).
The Mean Squared Error (MSE) quantifies the difference between the actual and predicted values.A low MSE value signifies greater accuracy in the predictions.
The Mean Absolute Error (MAE) measures the average absolute difference between the predicted and actual values of a variable.
Alongside the MSE, the model's goodness-of-fit was evaluated using the well-known R 2 metric.The R 2 metric measures the extent to which the model fits the data.Ideally, a perfect model (although improbable) would exhibit a low MSE, indicating minimal error accumulation, and high R 2 values.
Lastly, the simple dissimilarity between the observed and predicted SPI values was employed to estimate the prediction error (PE) of the models.

Results and Discussion
Figure 3 depicts the observed and predicted SPI time-series values alongside the Prediction Error for each region on the whole database (train and test) data.In general, the neural networks reported on the train data as well as cross-validation data using AutoML exhibited notable reductions in the MSE and MAE values, while showcasing high R 2 values (Table 3).
The findings indicate that the performance of the SPI AutoML models across the four

Performance Metrics
The assessment of the model's performance relied on the use of widely recognized metrics and loss functions, including the Mean Squared Error (MSE) and Mean Absolute Error (MAE).
The Mean Squared Error (MSE) quantifies the difference between the actual and predicted values.A low MSE value signifies greater accuracy in the predictions.
The Mean Absolute Error (MAE) measures the average absolute difference between the predicted and actual values of a variable.
Alongside the MSE, the model's goodness-of-fit was evaluated using the well-known R 2 metric.The R 2 metric measures the extent to which the model fits the data.Ideally, a perfect model (although improbable) would exhibit a low MSE, indicating minimal error accumulation, and high R 2 values.
Lastly, the simple dissimilarity between the observed and predicted SPI values was employed to estimate the prediction error (PE) of the models.

PE = SPI o i − SPI p i
(3)

Results and Discussion
Figure 3 depicts the observed and predicted SPI time-series values alongside the Prediction Error for each region on the whole database (train and test) data.In general, the neural networks reported on the train data as well as cross-validation data using AutoML exhibited notable reductions in the MSE and MAE values, while showcasing high R 2 values (Table 3).Overall, the comparison between the predicted and observed SPI values over the 100month test period demonstrated a significant level of agreement, as shown in Figure 4.The statistical summary of the scatter plot, derived from linear regression analysis, illustrates the relationship between the predicted and observed SPI values for the test datasets.This summary is provided in Table 4 and further support the findings.The findings indicate that the performance of the SPI AutoML models across the four regions under study was considered satisfactory.H 2 O AutoML reported that, in the four analyzed regions, the models were obtained by means of stacked ensemble estimators with a cross-validation strategy and a GLM metalearner algorithm.
Overall, the comparison between the predicted and observed SPI values over the 100-month test period demonstrated a significant level of agreement, as shown in Figure 4.The statistical summary of the scatter plot, derived from linear regression analysis, illustrates the relationship between the predicted and observed SPI values for the test datasets.This summary is provided in Table 4 and further support the findings.The evaluation of AutoML model's performance involved assessing its ability to predict SPI values in the test datasets across all months.Performance metrics such as R 2 and R values were used for comparison, as presented in Table 4.
Among the regions under consideration, AutoML models demonstrated the highest accuracy level in its predictions for the Highlands region, as indicated by the highest R value (0.964).The Mountains and Semi-desert regions showed the next best predictive performance, followed by the Canyons region with the lowest R value (0.933).However, overall, AutoML models demonstrated a satisfactory prediction skill for all regions considered in the study.
Table 5 provides a summary of the probability of prediction errors (PEs) under a normal distribution, indicating under-predictions (PE < 0) and over-predictions (PE > 0) made by AutoML models.A PE value of zero would signify a perfect alignment between the predicted and observed SPI values, indicating an ideal scenario [41].Based on the findings, it is evident that AutoML models exhibited both under-prediction and over-prediction errors.Among the regions, the Canyons region displayed the most significant disparity, with a 62% likelihood of under-prediction.Likewise, over-prediction was noticed in  The evaluation of AutoML model's performance involved assessing its ability to predict SPI values in the test datasets across all months.Performance metrics such as R 2 and R values were used for comparison, as presented in Table 4.
Among the regions under consideration, AutoML models demonstrated the highest accuracy level in its predictions for the Highlands region, as indicated by the highest R value (0.964).The Mountains and Semi-desert regions showed the next best predictive performance, followed by the Canyons region with the lowest R value (0.933).However, overall, AutoML models demonstrated a satisfactory prediction skill for all regions considered in the study.
Table 5 provides a summary of the probability of prediction errors (PEs) under a normal distribution, indicating under-predictions (PE < 0) and over-predictions (PE > 0) made by AutoML models.A PE value of zero would signify a perfect alignment between the predicted and observed SPI values, indicating an ideal scenario [41].Based on the findings, it is evident that AutoML models exhibited both under-prediction and over-prediction errors.Among the regions, the Canyons region displayed the most significant disparity, with a 62% likelihood of under-prediction.Likewise, over-prediction was noticed in the Semi-desert and Mountain regions, with the highest likelihood of over-prediction observed in the Semi-desert region (59.32%).These outcomes align with the summarized statistics of the linear models correlating the predicted and observed SPI values, as documented in Table 4. Previous research has demonstrated the remarkable efficacy of neural networks in the empirical forecasting of hydrological variables [42][43][44][45].Our findings are in line with the successful implementation of artificial neural network models in predicting the monthly standardized precipitation index, as evidenced by studies conducted by [4,15,16,46].Moreover, our investigation aligns with the findings of [10], emphasizing the efficacy of the ANN network modeling technique in capturing the intricate nonlinear dynamics of complex systems, specifically in the domain of SPI time-series forecasting.Our results extend the findings of [8,9,17] by allowing the derivation of smaller and more detailed regional climatic zones in Mexico using the SPI.Furthermore, this study verifies that AutoML techniques are intended to independently identify suitable machine learning models and fine-tune them, facilitating effective optimization for time-series data forecasting [35].In summary, our research findings demonstrated that the AutoML technique can be successfully used as a beneficial tool in the prediction of the SPI time series.

Figure 1 .
Figure 1.Study region of Zacatecas state within the Mexican territory.

Figure 3 .
Figure 3.The recorded, forecasted, and predicted error values using whole database (train and test) data for the regional SPI time series in the territory of Zacatecas state, Mexico.

Figure 3 .
Figure 3.The recorded, forecasted, and predicted error values using whole database (train and test) data for the regional SPI time series in the territory of Zacatecas state, Mexico.

Figure 4 .
Figure 4. Scatter plot and trend lines between observed and predicted SPI values using test data for regional SPI time series in Zacatecas state, Mexico.

Figure 4 .
Figure 4. Scatter plot and trend lines between observed and predicted SPI values using test data for regional SPI time series in Zacatecas state, Mexico.

Table 1 .
Ranges and categories of standardized precipitation index values.

Table 2 .
Predictor's descriptive statistics by region used to train the AutoML models.

Table 3 .
Quantitative performance metrics of the ANN reporting on train data (T) and cross-validation (CV) data for regional SPI time series in Zacatecas state, Mexico.Key metrics include the Mean Squared Error (MSE), Mean Absolute Error (MAE), and Determination coefficient (R 2 ).

Table 3 .
Quantitative performance metrics of the ANN reporting on train data (T) and cross-validation (CV) data for regional SPI time series in Zacatecas state, Mexico.Key metrics include the Mean Squared Error (MSE), Mean Absolute Error (MAE), and Determination coefficient (R 2 ).

Table 4 .
Performance of the ANN models using the linear regression formula (SPIp = β0 + β1 SPIo) applied to the observed SPI values (SPIo) and predicted SPI values (SPIp) using test data during the test period for regional SPI time series in Zacatecas state, Mexico.

Table 4 .
Performance of the ANN models using the linear regression formula (SPIp = β 0 + β 1 SPIo) applied to the observed SPI values (SPIo) and predicted SPI values (SPIp) using test data during the test period for regional SPI time series in Zacatecas state, Mexico.

Table 5 .
Likelihood of prediction error (PE) under normal distribution for observed SPI values (SPIo) and predicted SPI values (SPIp) using test data during the test period for regional SPI time-series in the state of Zacatecas, Mexico.