Atmosphere air temperature forecasting using the honey badger optimization algorithm: on the warmest and coldest areas of the world

Precisely forecasting air temperature as a significant meteorological parameter has a critical role in environment quality management. Hence, this study employs a hybrid intelligent model for accurately monthly temperature forecasting for one to three times ahead in the hottest and coldest regions of the world. The hybrid model contains the artificial neural network (ANN) hybridized with the powerful hetaeristic Honey Badger Algorithm (HBA-ANN). The average mutual information (AMI) technique is employed to find the optimal time delay values for the temperature variable for different time horizons. Finally, the performance of the developed hybrid model is compared with the classical ANN and the Gene Expression Programming (GEP) using some statistical criteria, and the Taylor and scatter diagrams. Results indicated that in each time horizon, the HBA-ANN model with the lowest distance from observation points based on Taylor diagram, high values for NSE and R2, and low values for RMSE, MAE, and RSR outperformed the ANN and GEP models in both training and testing phases. Hence, using the Honey Badger Algorithm could increase the accuracy of the model. This model's precise performance supports the case for it to be employed to forecast other environmental parameters.


Introduction
Air temperature among the main factors is in the discussion of hydrology, agriculture, irrigation, and the environment.It also is a critical meteorological parameter, which can describe global warming and climate change (Alomar et al., 2022).Many processes on the earth's surface, such as photosynthesis, respiration, evaporation, are regulated by air temperature.Air temperature significantly influenced human life, numerous ecological and meteorological events, and crops in agricultural areas.In this regard, knowledge of spatial changes in air temperature on a wide scale is necessary to carry out climatic, meteorological and hydrological studies and investigations.Accurate air temperature forecasting is necessary for planning agricultural operations, recreational activities, tourism, transportation, and energy generation, as well as developing measures to deal with temperature fluctuations (Yakut & Süzülmüş, 2020).Forecasting air temperature is an effective parameter in providing early (Chen & Lai, 2011;Horenko et al., 2008;Kuligowski & Barros, 1998;Rasnopolsky & Fox-Rabinovitz, 2006;Voyant et al., 2012), support vector-based regressors (Radkiha and Shashi, 2009), and Bayesian networks (Abramson et al., 1996;Cofino et al., 2002) are the other approaches that are used to forecast air temperature.Statistical models attempt to minimize implementing computer models.Compared to physical models, they have less computationally intensive and more straightforward.A bulk of researches indicated that statistical models had consistent results with physical models.Cointgration approaches and regression approaches are two categories of statistical analysis (Alomar et al., 2022).
High accuracy of the deep learning methods such as Stacked Auto-Encoder (Liu et al., 2015), Stacked Denoising Auto-Encoder (Hossain et al., 2015), recurrent neural network (RNN), convolutional neural network (CNN), Conditional Restricted Boltzmann Machine (CRBM) and (Xingjian et al., 2015), and long short-term memory (LSTM) model (Roy, 2020;Yang et al., 2020) is also encouraged researchers to used them in air temperature forecasting.Furthermore, the capability of artificial neural networks (ANNs) to capture nonlinear relationships between inputs and outputs has attracted the attention of scientists in various disciplines, including meteorology, hydrology, and water resources (Anushka et al., 2020;Chevalier et al., 2011;Rajendra et al., 2019).
There are various studies, which used different models to predict air temperature.Mohsenzadeh Karimi, Kisi, Porrajabali, Rouhani-Nia, and Shiri (2018) to predict the long-term monthly temperature, investigates the ability of random forest (RF), support vector machine (SVM), and Geostatistical Methods (GS) with 30-point data in Iran.Based on the results, the SVM and RF models outperformed the NF and NN models.Hanoon et al. (2021) implemented random forest (RF), gradient boosting tree (GBT), different artificial neural network (ANN) architectures (multi-layered perceptron, radial basis function), and Linear regression (LR) models to forecast relative humidity and air temperature in Malaysia.Results indicated the high performance of the ANN-MLP in forecasting daily air temperature.
In the study by Lin et al. (2021), a state-of-the-art hybrid of Multi-dimensional Complementary Ensemble Empirical Mode Decomposition (MCEEMD) outperformed the Radial Basis Function Neural Network (RBFNN) in forecasting maximum air temperature over next 7 days in Taipei.Very recently, Alomar et al. (2022) to forecast air temperature in North America at daily and weekly horizons have implemented ARIMA, Random Forest (RF), Regression Tree (RT), Support Vector Regression (SVR), and Gradient Boosting Regression (GBR), and Quantile Regression Tree (QRT) models.Results showed that the SVR model outperformed the other implemented models.The study by Sari et al. (2022) indicated the high accuracy of the GRU-LSTM Hybrid Model in predicting daily air temperature in Indonesia.In the study by Hou et al. (2022) for forecasting air temperature in China, the CNN-LSTM hybrid model showed better performance than CNN and LSTM models.Li and Yang (2022) showed that in forecasting daily air temperature, the SARIMA-LSTM hybrid model had high accuracy than ARIMA, SARIMA, and LSTM models.The study of Khan and Maity (2022) indicated that the long short-term memory (LSTM) neural network (hereinafter hybrid Conv1D-LSTM) and one-dimensional convolutional neural network (Conv1D) hybrid model has high ability in daily air temperature prediction.Singh et al. (2022) showed the superior performance of the Genetic Algorithm (GA) based hybrid machine learningpedotransfer Function (ML-PTF) model in forecasting spatial pattern of saturated hydraulic conductivity.In the study by Zhang et al. (2022), the Adaptive Neuro-Fuzzy Inference System (ANFIS) model outperformed the bilayered NN model in forecasting the daily dew point temperature.
As it is impossible to detect accurate air temperature forecasting results with limited data in some circumstances, Artificial Intelligence (AI) models would be a better choice for the purpose of understanding and illuminating the underlying mechanisms regulating such variations in temperature patterns.As a result, datadriven simulation methods in the form of both standalone and hybrid models are rapidly evolving in a variety of scientific domains, including hydrology.Hybrid models can outperform standalone models.Since there is usually a trade-off between model performance and training time, as a result, a hybrid model that can deliver both high performance and quick training time is required.

Research significance and novelty of the work:
In large-scale optimization issues finding a best possible solution in the search space is a complicated matter.Furthermore, the algorithm convergence has not been much influenced with changing algorithm variables.Hence, for high complexity and massive dataset, even if the correct initial parameters have been determined, the algorithm could not accomplish adequate exploitation and exploration.Consequently, to perform the best exploitation and exploration for earning the general local and global searches, applying a powerful operator is needed.In this work, we explore and compare the abilities of the Honey Badger Algorithm (HBA), which has been successfully applied to atmosphere air temperature forecasting.To the best of our knowledge, no research has been done so far on the use of hybridization of the HBA with the artificial neural network (HBA-ANN) to forecast air temperatures in specific climatic regions, such as what is known as the hottest and coldest places on earth.Forecasting of atmosphere air temperature changes in such places on earth is very important due to the importance of water crises in many arid and semi-arid regions of the world.Besides, accurate air temperature forecasting helps to accurate energy saving and climate change policies.Therefore, the aim of this study is forecasting air temperature using the HBA-ANN hybrid model in Furnace Creek (hottest region) and Vostok stations (coldest region); and compare its efficiency with the standalone ANN model, and Gene Expression Programming (GEP).
The rest of the paper is organized as continuing to follow: Methodology is described in Section 2, outlining the research field and the data utilized.Section 3 displays and discusses the acquired results.Finally, the conclusions are provided to show the efficacy of the models created for the air temperature forecasting.

Artificial neural network
Artificial neural network (ANN) consists of a black box model.In the ANN model, the simulation process is done according to the NNs of the human brain (Haykin, 1999;Nayak et al., 2006).Each ANN model has a set of connections between nodes, weights, and action functions.Moreover, there are an input layer, one or more hidden layers, and an output layer.the number of hidden layers and choosing an appropriate algorithm can have a significant impact on the efficiency of ANN when modelling problems.ANN models contain various algorithms, including: Levenberg-Marquardt (LM), Bayesian regularization (BR), adaptive learning rate (GDX), scaled conjugate gradient (SCG), and gradient descent with momentum.Artificial neural networks are typically designed through feed-forward neural networks (FFNN), which is one of the most popular, which is widely used for simulating and predicting hydrological problems.In the ANN, the relationship between input (x) and output (Y) is as follows: where f is the action function; b is the bias; W i is the weight of link.The structure of a common ANN model is shown in Figure 1.

Honey badger algorithm (HBA)
Honey badgers are a type of mammals that mostly live in Southwest Asia, Africa, and the Indian subcontinent.
Their prey includes more than 60 species of animals, including deadly snakes.In addition to being clever mammals, they enjoy honey and can utilize tools.Subspecies of honey badgers exist in 12 different groups.In the event of a predator being unable to escape, they are strong animals that are not afraid to attack them.Furthermore, they feed themselves by attacking bird nests and honey beehives by climbing trees.(Begg et al., 2003;Begg et al., 2005).A mathematical model based on the honey badger's behaviour is presented in the following sections.

Inspiration
HBA follows honey badger explore behaviour.Food sources are found by the HB either by scenting and digging or by tracking the honeyguide bird.Digging mode refers to the first stage, then another stage is known as honey mode.In the previous phase, to estimate its prey's location, it uses its sniffing skills, Honey badgers use the honey guide bird as a guide to locate beehives directly in the final mode.

Model based on mathematics
The following subsections provide an explanation of the mathematical models used in the HBA.In terms of mathematics, in the HBA, the population of candidate solutions (X) is as follows: j-th honey badger position The first phase of the process is initialization: In order to determine the relative positions of honey badgers with n populations, use the following expression: where LBi, and Ubi represent the lower and upper bounds of the search space, respectively; while x j presents the jth honey badger position and in a population with size n refers to a nominee solution.
The second step is to define the intensity: There are several factors that affect the intensity (I), including the efficacy of concentration of the prey and the distance among it and the j-th honey badger.I j it shows the intensity of the smell of the prey; slow motion occurs when the scent is low and fast motion occurs when it is high.It is furnished by inverse square law (ISL) (Kapner et al., 2007), as represented in Figure 2 and described by (5) as: where S, an indicator of the concentration or intensity of a source (prey position), and d j, distances between prey and j-th badgers are measured.
Updating the density factor is done in the third step: As a result of the density factor (α) governing timevarying randomness from exploration to exploitation, exploration to exploitation can take place seamlessly.Using Eq. ( 8), the updated decreasing factor (α) with each iteration, the randomization decreases: where (C) shows a fixed number that is more than 1 (the value is 2, by default), and Max iter it indicates the maximum number of repeats.The fourth step is to fleeing the local solution: To flee from the local solution area, the current and two next steps are used in the HBA.According to this scenario, it uses a flag (F) to change the search direction and provides favourable conditions for accurate scanning of the search field.
The fifth step is to update the positions of the operators: Update of HBA positions (X new ) is involved into two modes: 'digging mode' and 'honey mode'.
Detailed descriptions are provided below: • Digging mode: There is a cardioid shape to a honey badger's dig (Elseify, Kamel, Abdel-Mawgoud, & Elattar, 2022) in the process of digging.Eq. ( 9) simulates the approximate cardioids motion as: where X prey describes the best position the prey has achieved to date, in other words, the global optimal position.β ≥ 1 indicate Honey badgers are capable of finding food (default = 6).The r3, r4, and r5 are random numbers generated within an interval of three [0,1].F shows a flag that substitutes the direction of the search, and it is specified by ( 10): In the digging mode, three factors greatly influence the behaviour of honey badger: A measure of the intensity of the scent (I) of the prey (Xprey), the distance among prey and HB (d j ), and the decreasing operator (α).In addition, while digging, a badger may feel any outage (F), letting it to discover even the prime prey location (Figure 3).• Honey mode: Eq. ( 11) simulation of the honeyguide bird exploring a beehive with the honeyguide bird.
Eq. ( 8) and ( 10) calculate the value of (α) and (F), respectively.X new , and X prey describe the new position and location of the prey, respectively.this can be seen from ( 11) it is assumed that the HB searches near Xprey, which is the optimal prey position, in accordance with the distance information (d j ).Also in this step, the evolution of search behavior (α) affects search results.it is also possible for honey badgers to experience perturbations (F).
According to theory, the HBA is noticed a global optimization method as the stages of exploration and exploitation.This algorithm is designed to be simple to implement and comprehend by reducing the number of   agents that need to be improved.the HBA is mainly determined by three variables, The number of variables in a state (d), the number of populations (n), and the maximum number of iterations (Max iter ).By using honey attraction to attract soles in the population towards the optimal soles, the HBA optimization technique ensures strong local search capability. in addition, the density factor guarantees that the local optimal solutions will not be generated by the algorithm, as well as providing the algorithm with global search capabilities.Figure 2, shows the overall flowchart of hybrid algorithm (HBA-ANN).Figure 3, showed the pseudocode related to algorithm of Honey Badger Optimization.Table 1 shows the parameters used in the modelling stage of ANN and HBA-ANN models.
The main characteristics of Honey badger algorithm are the following: using randomization for initialization, two phases that are randomly determined, and Cardioid motion in the exploration phase.There are several advantages of using HBA for optimization, namely: Hybridization: it can be merged with more traditional optimization methods.HBA algorithm can be compounded with others and detect the best solution for problems by utilizing the operators and advantages of other algorithms.Broad applicability: it can be used to each problem that can be formulated as function optimization.The proposed algorithm, denoted as HBA-ANN, is an enhanced version of the newly developed honey Badger algorithm.The enhancement is achieved by adding training function and Transfer function from ANN.The HBA-ANN improved the diversity of the population and the ability to escape from local minimums.

Gene expression programming (GEP)
Genetic Programming (GP) is truly a 'bottom-up' process, can be identify for any defined time series.GEP does not have any assumption on the structure of the relation between dependent and independent variables.The structure of the relation is produced feasible by two efficient and details emulation of evolutionary procedures become feasible only when these detail functions handin-hand.A parse tree (a functional set of mathematical functions) and the terminal set (including parts of functions and their parameters) are two components of two GEP, which those chosen in this study are: The identified relation in a specific GEP modelling is never constant and continually evolving.The assessment begins from n primary chosen random population of models.The values of the dependent and independent variables are used to evaluate the fitness value of each model.As the population doffs from one descendant to another, old models were replaced with new models by having verifiably better efficiency (Khatibi et al., 2011).
The method implemented in the present research is adverted to as the Gene Expression Programming (GEP) according to erupting computer programs of various shapes.The shapes also codified in linear chromosomes of fixed lengths.It has been stated that GEP is 10 thousand times more effective than Genetic Programming (GP) systems (Ferreira, 2001a(Ferreira, , 2001b) ) for some proofs, including: (i) the chromosomes are easy entities: compact, easy to manipulate genetically, linear, and relatively small; (ii) the parse tree is exclusively the statement of its relative chromosomes.
Exerting operators such as mutation and crossover to the winners, 'offspring' or 'children' are built, that crossovers are accountable for maintaining equal flair from a descendant to the other but mutation motives a random variation in the parse tree, however data mutation is feasible, too.This perfects the actions at the primary descendant and until termination, the procedure is repeated.For using GEP model, the GenXpro software application (Ferreira, 2001a), (Ferreira, 2001b) was implemented in this research.

Study area and datasets
Death Valley National Park in California, USA, with an area of 1300 km 2 , is famous for its very hot and dry climate.High temperature, high evaporation, low humidity, and low rainfall in summer are the main features of this valley.The place Furnace Creek is designated as a census-designated place (CDP) in Inyo County, California, United States.as reported by the US Census Bureau, Furnace Creek has a total area of 82 km 2 .Daytime temperatures range from roughly 18 °C in December to 47 °C in July, while overnight lows typically oscillate from 4 to 32 °C.During that time period, during July, the temperatures were the highest, with an average daily high temperature of 46.9 °C, and in June, the driest month was recorded, with an average monthly precipitation of 1.3 mm.As the world's highest temperature recorded, Furnace Creek holds the record, reaching 134 °F (56.7 °C) on 10 July 1913.The lowest temperatures on Death Valley summer nights can remain above 38 °C and this has not been recorded anywhere else in the USA.With a temperature of 56.7 °C on 10 July 1913 at the Furnace Station, this point is known as the hottest spot-on Earth.There are likely areas in the world with higher temperatures than Furnace Creek, but they are undocumented.
Antarctica is a remote continent with an area of 14 million km 2 , of which only 0.18% of its surface is icefree.Antarctica, as one of the less understood parts of the earth, has an undeniable intrinsic value.The scientific information that is produced every year about the various vital and physical aspects of Antarctica shows the remarkable scientific value of Antarctica.Located in the highlands of the Antarctic Plateau, the Vostok region has a very cold and continental polar climate.There is a weak catabatic southerly south-westerly wind regime at Vostok station, with mean annual speeds above 5.0 m/s.In March and September, wind speed variations are pronounced, and in January, they are sharply Minimum.September has the highest wind speed (5.5 m/s) of all the months.January is the calmest month of the year.wind speed during the multiyear period is 4.2 m/s.The average temperature in January and December is around −32 °C as the highest temperature, but as a result of 'coreless winter', the temperature drops rapidly to about −65 °C by April.Vostok is the only deep drilling station in Antarctica where the thickness of the snow isotope composition can be directly compared with relatively long meteorological observational data (45 years).With a temperature of −89.2 °C on 21 July 1983 at Vostok Station, this point is known as the coldest point on Earth.
The geographical location of the two stations corresponding to the hottest and coldest point on Earth is shown in Figure 4. Therefore, in this study, first, daily temperature (T) data related to Furnace Creek and Vostok stations in the period 1979-2020 were collected.The statistical characteristics of T (°C) and coordinates of the studied stations are also in accordance with Table 2. Data segmentation in both stations were 70% (1979-2008) for the training and 30% (2009-2020) for the testing phase.

Error analysis
In order to evaluate the performance of the models, three metrics are used in this study and defined in Table 3.In this Table, p(i) and o(i) are the forecasted and observed temperature values, respectively; p and ō are the averaged forecasted and observed temperature values, respectively; and N is the numbers of data (Shabani, Hayati, Pishbahar, Ghorbani, & Ghahremanzadeh, 2021).
A model will have high accuracy, if it has R 2 close to 1, and the RMSE and MAE close to 0. The various ranges of the NSE and RSR performance metrics are presented in Table 4 (Moriasi et al., 2015).
A variety of metrics have been used by researchers in the different fields to analyze their models, and there is no standard for a unified parameter.The NSE, RMSE, MAE, RSR, and R 2 criteria are used in this study to evaluate the performance of the models.The lower the values of the RMSE and NSE the more accurate the models' results are; conversely, the higher the values of the R 2 , the better the model's performance.Taylor diagrams in order to evaluate the studied models, visual analysis was used.using the Taylor diagram offers the advantage of applying two common statistics, namely the standard deviation, and the correlation coefficient.The closer the forecasted values to the observed ones in terms of standard deviation and correlation coefficient, will be the most accurate model (Shabani et al., 2021).

Results and discussion
This section is slot into three subsections.The first subsection provides the findings of the average mutual information for the data collected from the warmest and the coldest place.The second subsection presents the suggested HBA-ANN model performance for the coldest and warmest places of the earth.The results are analyzed and discussed in the subsequent section.

Average mutual information (AMI)
In this study, the monthly temperatures are forecasted at three times horizons T + 1, T + 2, and T + 3 for both the warmest and coldest places.For this case, the standalone ANN and hybrid HBA-ANN models are used as modelling approaches.In order to run such models, determining optimal set of inputs is required.The average mutual information (AMI) technique is employed in this study to find the optimal time delay values for the temperature input variable in different time horizons at the selected warmest and coldest places.The results of the AMI technique are shown in Figure 5 for the coldest and warmest places.It is found that up to 4 (T, T-1, T-2, T-3, T-4) and 3 (T, T-1, T-2, T-3) time delays can be used as the optimal time delays for the monthly temperature variable in the coldest and warmest places, respectively (Table 3).

Models' performance in the coldest region (Vostok station)
Five different input combinations were considered for developing the ANN, GEP, and HBA-ANN models for forecasting air temperature at three times horizons (T + 1, T + 2, and T + 3) (Table 4).The performance of the hybrid HBA-ANN models was assessed using the error metrics (R 2 , RMSE, MAE, RSR, and NSE) and compared with the standalone ANN and GEP in Table 4.
According to this Table, in most cases, it is observed that the accuracy of the temperature forecasting increases with the number of input variables (time lags) for all models and all times ahead.In other words, often, the last input combination including lag times up to 4 (T, T-1,   T-2, T-3, and T-4) is the best input combination for temperature forecasting in 1 month to 3 months ahead for all models.Furthermore, it is seen that the hybrid HBA-ANN model outperforms the ANN and GEP models based on the low values of the RMSE, MAE, and RSR and high values of the R 2 and NSE e.g. the R 2 , RMSE, and NSE values of the standalone ANN model for T + 1 time are 0.891, 4.519, 0.889 in the training period, and 0.909, 3.925, 0.906 in the testing period, respectively, while these values improves into 0.937, 3.399, 0.937 in the training period, and 0.970, 2.204, 0.970 in the testing period for the HBA-ANN hybrid model.Similar results can be seen for the other two times ahead (T + 2 and T + 3) for both of the ANN and HBA-ANN models.Furthermore, in both training and testing phases, compared to GEP model, the HBA-ANN model has high values for R 2 , NSE, and low values for RMSE, MAE, and RSR at each time horizons.In general, the HBA-ANN model shows high capability than the classical ANN and GEP models in air temperature forecasting based on all statistical metrics especially in the testing phase.As a last finding from this Table, the accuracy of the air temperature forecasted values is decreased by increasing times ahead even in proposed model.The best results of the ANN, GEP, and the HBA-ANN models for T + 1, T + 2, and T + 3 times were considered for the further graphical analysis.
Scatter plots of the observed and forecasted temperature along with R 2 , RMSE, 95% confidence interval, and error histograms were provided for the best ANN, GEP, and HBA-ANN models in order to judge the model's performance in a more comprehensive manner for all times ahead (Figure 6).The quantification of the uncertainty of the developed models is more ascertained by the results of the prediction intervals.These plots suggest superior forecasting capability of the HBA-ANN models, with R 2 values ranging from 0.939 to 0.970, which are higher than those of the ANN model ranging from 0.833 to 0.908; and with RMSE values ranging from 2.204 to 3.212, which are lower than those of the ANN ranging from 3.925 to 5.126 for all three times ahead.This point is clearly visible in the scatter plots drawn between the observed and modelled temperature data in terms of having lesser dispersions around the perfect line (i.e.y = x) for all three times ahead.The shapes of the error histograms are approximately similar to the normal distribution especially for the HBA-ANN models, which it is another evidence for high accuracy of the HBA-ANN model in forecasting air temperature.
In addition to the scatter plots, the ANN, GEP and HBA-ANN model's performances were also evaluated using the Taylor diagrams (Figure 7).This is an effective and popular way of comparing the models' rankings.It gives a better picture of the resemblance of model results to that of observed data (Taylor, 2001).The similarity between the observed and forecasted results for three times ahead was quantified in terms of their RMSD, standard deviations, and correlation coefficient in the Taylor diagrams.The performances were analyzed by observing the positions of the models' points on the respective plots.It is observed that the HBA-ANN models show better performance than the ANN and GEP models for all the times ahead based on near distance of its points to the observed points.
For visual comparison of the temporal variation of the temperatures, the time series analysis was applied (Figure 8).According to Figure 8, it is worth mentioning  that the forecasting values of the HBA-ANN models especially the extreme values are fitted well with the observed values in the testing period rather than the training period for all the times ahead.This may be due to the better generalization capability of the HBA algorithm, which is being utilized for optimizing the internal parameters of the ANN model development.Moreover, by increasing time ahead, the efficiency of all three models decreased.
Following the development of ANN, GEP, and HBA-ANN models, errors were calculated and stacked histogram of the errors were plotted (Figure 9).The error values show negative, if the forecasted values are larger than observed and vice versa.The y-axis reflects the number of samples in each dataset and the zero-error line is equivalent to no error value.For T + 1 time horizon, it is interesting to note that there are smaller errors in the testing period than the training period for all three models.This is more evidently seen in the hybrid HBA-ANN model.Moreover, the total error range for the hybrid model (about [−4, 10]) is smaller than the ANN and GEP model.In addition, all the error plots show an approximately asymmetrical histogram with respect to the zero error.Similar observations can be seen for other times ahead as well.The results confirm that all three models are statistically significant, however better and superior performances can be seen for the HBA-ANN model for forecasting air temperature data at the coldest place.Because the forecasted values with HBA-ANN model is almost near to the actual values and there is very low error.Hence, the HBA-ANN model could forecast the future values of the air temperature with high accuracy, and so it can increase the efficiency of the climatic policies.

Models' performance in the warmest region (Furnace Creek station)
Similar to the coldest region, different input combinations were considered for developing the ANN, GEP, and HBA-ANN models for forecasting air temperature at three times horizons (T + 1, T + 2, and T + 3) for the warmest region (Table 5).The performance of the hybrid HBA-ANN models was assessed using the same error metrics (R 2 , RMSE, MAE, RSR, and NSE) and compared with the standalone ANN and GEP in Table 5.According to this Table, it is observed that, in most cases, the accuracy of the temperature forecasting increases with the number of input variables (time lags) for all models and all times ahead.In other words, the last input combination including lag times up to 3 (T, T-1, T-2, and T-3) is the best input combination for temperature forecasting in 1 month to 3 months ahead for the models.
According to Table 6, it can be seen that the hybrid HBA-ANN model outperforms the ANN and GEP models based on the low values of the RMSE, MAE, and RSR  , it also can be seen that the accuracy of temperature forecasting values is decreased by increasing times ahead for all models.It may be mentioned as one of the limitations of the proposed models.The best results of the ANN and the HBA-ANN models for T + 1, T + 2 and T + 3 times were considered for the further graphical analysis.
in order to evaluate the model's performance in a more comprehensive manner for all times ahead, Figure 10 presents the scatter plots of the observed and forecasted temperature along with R 2 , RMSE, 95% confidence interval, and error histograms were provided for the best ANN, GEP and HBA-ANN models.The quantification of the uncertainty of the developed models is more ascertained by the results of the prediction intervals.These plots suggest superior forecasting capability of the HBA-ANN models, with R 2 values ranging from 0.967 to 0.976, which are higher than those of the ANN model ranging from 0.869 to 0.939; and higher than GEP model ranging from 0.927 to 0.944 and with RMSE values ranging from 1.725 to 1.496, which are lower than those of the ANN and GEP models for all three times ahead.This point is clearly visible in the scatter plots drawn between the observed and modelled temperature data in terms of having lesser dispersions around the perfect line (i.e.y = x) for all three times ahead.The shapes of the error histograms are approximately similar to the normal distribution especially for the HBA-ANN models.
We also utilized the Taylor diagram to evaluate the ANN, GEP and HBA-ANN model's performances (Figure 11).Taylor diagram identified the similarity between the observed and forecasted results for three times ahead was quantified in terms of their RMSD, standard deviations, and correlation coefficient.The performances were analyzed by observing the positions of the models' points on the respective plots.The model with low distance from observation point will have superior performance.It is observed that the HBA-ANN models show better performance than the ANN and GEP for all the times ahead based on near distance of their points to the observed points.Furthermore, the time series analysis was applied for visual comparison of the temporal variation of the air temperatures at Furnace Creek station (Figure 12).According to Figure 14, it is worth mentioning that the estimations of the HBA-ANN models especially the extreme values are fitted well with the observed values  in the testing period rather than the training period for all the times ahead.This indicated the better generalization capability of the HBA algorithm, which is being used for optimizing the internal parameters of the ANN model development.Moreover, by increasing time ahead, the efficiency of the models (even HBA-ANN hybrid model) was be decreased.Therefore, although the hybrid model increases the performance of the neural network, its efficiency decreases in long-term predictions like the neural network model and the gene program.
Following the development of ANN, GEP and HBA-ANN models, errors were also calculated and stacked histogram of the errors were plotted (Figure 13).For T + 1 time horizon, it is interesting to note that there are smaller errors in the testing period than the training period for all three models.This is more evidently seen in the hybrid HBA-ANN model.Moreover, the total error range for the hybrid model (about [−3, 6]) is smaller than the standalone ANN model and GEP.In addition, all the error plots show an approximately asymmetrical histogram with respect to the zero error.Similar observations cab been seen for other times ahead as well.Based on results, ANN, GEP, and HBA-ANN models are statistically significant, however better and superior performances are seen for the HBA-ANN model for forecasting air temperature time series at the Furnace Creek station.Accurate air temperature forecasting is a critical key in human life, and agriculture, climatic and energy-saving policies.In general, the results indicated more precise performance of the HBA-ANN model in air temperature forecasting in both of the coldest and warmest regions compared to the ANN and GEP models.In other words, the HBA optimization algorithm has a high efficiency in improving the results of the standalone ANN model in both regions with different climate conditions.Our result is similar to the studies that indicated the superior performance of the hybrid model (with R 2 > 0.95) compared to stand alone models (such as Kazemi et al., 2021;Khan & Maity, 2022;Li & Yang, 2022;Lin et al., 2021;Sari et al., 2022).The computational cost of the HBA-ANN model is similar to the ANN and is less than the GEP.Moreover, comparing the obtained results in both regions showed that error values of implemented models were less in the warmest region than those in the coldest region.It seems that in the cold regions, efficiency of the hydrological models may decrease in simulating phenomena.

Conclusion
In this study, a hybrid intelligent model contains the artificial neural network (ANN) hybridized with the powerful hetaeristic Honey Badger Algorithm (HBA-ANN) was proposed for monthly temperature forecasting for one to three times ahead in two different regions of the world with the hottest and coldest climate conditions.The average mutual information (AMI) technique was employed to find the optimal time delay values for the temperature variable for three-time horizons.An analysis of the performance of the hybrid model developed was compared with the classical ANN and GEP models using some statistical criteria (e.g.Nash-Sutcliffe Coefficient, Root-Mean Square Error, and Coefficient of Determination) and plots (e.g.Taylor and scatter diagrams).The important results obtained are as follows: i) The HBA-ANN model outperforms the ANN and GEP model in both the coldest and warmest regions of the world based on all the statistical criteria and plots.ii) The HBA optimization algorithm performed well in optimizing the ANN's parameters and so improving its accuracy in both regions.
iii) The ANN, GEP and HBA-ANN models' efficiency was better in the hottest region than the coldest one.iv) By increasing number of the input variables (i.e.lag times of the temperature variable), the precise of both AI models approximately increased.v) By increasing the times ahead values, the forecasting accuracy of standalone ANN, GEP, and hybrid models were decreased in both regions.
It is recommended to evaluate the HBA-ANN model capability in predicting temperature with other climatologically input variables such as wind speed in the same/other regions of the world.the capability of the model could be evaluated for hourly and daily data if it is available.Moreover, it is recommended to hybrid the HBA with other intelligent standalone models (e.g.SVM model) and to examine its performance in other hydrological and environmental parameters, such as air pollutant.

Figure 1 .
Figure 1.Common structure of the artificial neural network (ANN).

Figure 4 .
Figure 4. Geographical location of meteorological stations at the hottest (A) and coldest points (B) on Earth.

Figure 5 .
Figure 5. AMI results for selecting optimal time lags at the coldest and warmest places.

Figure 6 .
Figure 6.Scatter plots of the observed and forecasted temperature along with R 2 , RMSE, 95% confidence interval, and the error histograms for the ANN (a), GEP (b), and HBA-ANN (b) models for three times ahead at the coldest place.

Figure 7 .
Figure 7. Taylor diagram of the models for T + 1, T + 2, and T + 3 times horizons for the coldest region.

Figure 8 .
Figure 8.Time series plots of the observed and forecasted temperature of the ANN (a), GEP (b), and HBA-ANN (c) models for all times ahead in the training and testing periods at the coldest region.

Figure 9 .
Figure 9. Error histogram plots of the ANN (a), GEP (b), and HBA-ANN (c) models for all times ahead in the training and testing periods at the coldest region.

Figure 10 .
Figure 10.Scatter plots of the observed and forecasted temperature along with R 2 , RMSE, 95% confidence interval, and the error histograms for the ANN (a), GEP (b), and HBA-ANN (b) models for three times ahead at the warmest place.

Figure 11 .
Figure 11.Taylor diagram of the models for T + 1, T + 2, and T + 3 times horizons for the warmest region.

Figure 12 .
Figure 12.Time series plots of the observed and forecasted temperature of the ANN (a), GEP (b), and HBA-ANN (c) models for all times ahead in the training and testing periods at the warmest region.

Figure 13 .
Figure 13.Error histogram plots of the ANN (a), GEP (b), and HBA-ANN (c) models for all times ahead in the training and testing periods at the warmest region.

Table 1 .
Parameters used for ANN and HBA-ANN.

Table 2 .
Daily statistical characteristics of temperature variable and coordinates of the studied stations.

Table 3 .
The evaluation metrics of the models.

Table 4 .
RSR and NSE performance ranges.

Table 5 .
Error metrics of the ANN and HBA-ANN models at the warmest place.Blue and grey colours represent the best model and input combinations based on the statistical measures, respectively for each time ahead.

Table 6 .
Error metrics of the ANN, GEP and HBA-ANN models at the coldest place.
Note: Blue and grey colours represent the best model and input combinations based on the statistical measures, respectively for each time ahead.