Integrated metaheuristic algorithms with extreme learning machine models for river streamflow prediction

Accurate river streamflow prediction is pivotal for effective resource planning and flood risk management. Traditional river streamflow forecasting models encounter challenges such as nonlinearity, stochastic behavior, and convergence reliability. To overcome these, we introduce novel hybrid models that combine extreme learning machines (ELM) with cutting-edge mathematical inspired metaheuristic optimization algorithms, including Pareto-like sequential sampling (PSS), weighted mean of vectors (INFO), and the Runge–Kutta optimizer (RUN). Our comparative assessment includes 20 hybrid models across eight metaheuristic categories, using streamflow data from the Aswan High Dam on the Nile River. Our findings highlight the superior performance of mathematically based models, which demonstrate enhanced predictive accuracy, robust convergence, and sustained stability. Specifically, the PSS-ELM model achieves superior performance with a root mean square error of 2.0667, a Pearson’s correlation index (R) of 0.9374, and a Nash–Sutcliffe efficiency (NSE) of 0.8642. Additionally, INFO-ELM and RUN-ELM models exhibit robust convergence with mean absolute percentage errors of 15.21% and 15.28% respectively, a mean absolute errors of 1.2145 and 1.2105, and high Kling-Gupta efficiencies values of 0.9113 and 0.9124, respectively. These findings suggest that the adoption of our proposed models significantly enhances water management strategies and reduces any risks.

brings sensitivity to initialization parameters, which may affect the model's performance and generalization capabilities.Unlike iterative methods, ELMs randomly initialize weights and analytically determine output weights, which may limit their ability to reach optimal solutions, especially in complicated or nonlinear problems.To address these problems, in this paper, we introduce three state-of-the-art mathematical-inspired metaheuristic algorithms for training the ELM network: Pareto-like sequential sampling (PSS), weighted mean of vectors (INFO), and the Runge-Kutta optimizer (RUN).The rationale for adopting these three recommended algorithms derives from the theoretical idea of the No-Free Lunch theorem 34 , which states that no single algorithm outperforms all other algorithms on all problems.Thus, proposing three algorithms is expected to yield superior effectiveness and performance capabilities.
• PSS 35 is a recently proposed algorithm that has shown strong performance in a variety of engineering design problems 36 .PSS, which employs techniques such as Latin hypercube and Monte Carlo sampling, demonstrates significant exploration capabilities, making it a great choice for utilization in hydrology.• INFO, proposed in 2022 and referenced over 400 times 37 , has exhibited remarkable capability across diverse fields and applications 38 .With its ability to balance exploration and exploitation phases effectively, INFO is highly efficient, thus motivating us to apply it in domains with complex problems like river streamflow forecasting.• RUN 39 , another recently proposed algorithm, adheres to the metaphor-free population-based criterion and is inspired by the well-known Runge-Kutta technique in mathematics.This algorithm has been shown to be highly efficient in a variety of real-world issues and application [40][41][42] .As a result, implementing RUN in areas with complicated and difficult problems, such as hydrology, is considered crucial.
With all of the aforementioned reasons and the additional goal of this paper being to develop effective river streamflow forecasting models.The proposed models are extensively tested by applying it to historical river streamflow data collected from the main gauge station at Egypt's Aswan High Dam (AHD).This station is selected specifically for its extensive monthly record spanning 130 years on the Nile River, which is the largest and longest river basin globally.
The proposed algorithms, when combined with the ELM network, including PSS, INFO, and RUN, will be compared with 17 other algorithms from eight different types of metaheuristic optimization: swarm-inspired, evolutionary-inspired, physics-inspired, human-inspired, biology-inspired, system-inspired, music-inspired, and math-inspired.The fundamental reason for using numerous models in such comparisons is to objectively evaluate the capabilities of the proposed models.This is especially important given the No Free Lunch theory, which states that no algorithm outperforms all others in all problem categories.As a result, this comparison is critical to ensuring the study's comprehensiveness.Furthermore, the performance of these models will be evaluated using six performance metrics: mean absolute error (MAE), root mean square error (RMSE), Pearson's correlation coefficient (R), mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE).These are common metrics often employed in fields related to streamflow and hydrology.
The remainder of this article is structured as follows.Section "Materials and Methods" describes the data collection, the case studies, the theoretical background of the proposed optimizers (PSS, INFO, and RUN), the extreme learning machine, and our proposed hybrid models.Section "Experiments" describes details the comparison models, parameter settings, and performance metrics.Section "Results and Discussion" presents the achieved results, an analysis of the findings, and discussions about the model's performance.Section "Limitations and future directions" highlights potential limitations and provides recommendations for future research based on this study.Finally, Section "Conclusion" summarizes our study and outlines our future works.

Study area and data acquisition
The main objective of this study is to forecast the natural flow pattern of the Nile River at the AHD site, which is located in the southern region of Egypt, as depicted in Fig. 1.The AHD is a large reservoir constructed in southern Egypt by damming the river.Its construction aimed to control flooding, generate hydroelectricity, and provide irrigation water.This study uses natural flow data records spanning 130 years (from 1870 to 2000) at the AHD as the basis for the dataset 43 .This research utilized the Nile River inflow data from Aswan, as published by the Egyptian Ministry of Water Resources and Irrigation.This dataset is one of the most extensive and longlasting of its kind, providing valuable insight into the long-term flow trends in the region.The fluctuating flow pattern is clearly depicted in Fig. 2, which illustrates the yearly flow.

Model Descriptions
In this section, a full description of the proposed algorithms and models is provided.First, a clear representation of the foundation of the proposed algorithms, including the PSS, INFO, and RUN algorithms, is provided.Subsequently, comprehensive discussions of the ELM model for time-series modeling will be reported.Finally, the proposed hybrid model, which combines metaheuristic algorithms and an ELM network, is presented.
Before describing the details of the proposed algorithms, the initialization process, fitness function, and several symbols used throughout the three proposed algorithms are introduced below.All metaheuristic optimization algorithms involve an initialization process.This process starts by initializing N random solutions (population size or pop_size).Each solution is a D-dimensional vector (D variables or problem size) obtained using a uniform random distribution, which can be described by Eq. (1).
where i = 1,2… N; j = 1,2,….D; X 0 i,j is the initial position vector of the i th solution; X min j and X max j denote the minimum and maximum values, respectively, for the j th dimension of the i th solution; and R i,j is a uniform ran- dom value in the range of [0, 1].
The fitness function is a function that is used to evaluate the quality of a solution and can be composed of one or multiple objective functions or variations of objective functions.For minimization problems, the lower the fitness function is, the better the solution, and conversely, the higher the fitness function is, the worse the solution.Some of the standard notations used throughout this paper are presented in Table 1.

Pareto-like sequential sampling (PSS) optimization
The PSS algorithm is a recently proposed optimization algorithm 35 based on simple sampling and design-ofexperiment (DOE) methods, such as Monte Carlo and Latin hypercube sampling.Despite its simplicity, the PSS  www.nature.com/scientificreports/has been shown to outperform several recent algorithms (PFA, PSO, WOA, etc.) in benchmark functions and engineering problems 35 .The algorithm starts by initializing a set of N vectors with random coefficient vectors generated using DOE methods as modeled in Eq. ( 2) where X min j and X max j are the lower bound and upper bound for the jth dimension, respectively; R i,j is the random variable generated using the DOE sampling method; and ⊙ is the elementwise multiplication of vectors.After the initialization of the population, the PSS algorithm determines the best solution to rescale the lower and upper bounds for the prominent area (Eq.3), which is then used to update the next positions of the solutions.
where the acceptance probability α describes the resampling of the 80/20 analogy.After that, the PSS updates the position of the current search agent using either the prominent domain (with the new lower and upper bounds) or the overall domain (the default lower and upper bounds).This process can be modeled using Eqs.(5)  and 6 where µ i,j is a random variable generated using the DOE sampling method and ⊙ represents the elementwise multiplication of vectors.The flowchart of the PSS algorithm is shown in Fig. 3.

Weighted mean of vectors (INFO) optimization
The INFO algorithm is a recent math-inspired metaheuristic proposed by 37 .It consists of four key phases: initialization (described above), updating rule, vector combining, and local search.Due to its solid mathematical concepts, INFO has been demonstrated to be effective in several applications, such as engineering optimization 44 , image classification tasks 45 , and feature selection 46 .This section briefly overviews some of the main operators and equations used in INFO, with more detailed information available in 37 .
Updating rule phase.Generally, INFO generates a set of random vectors and then derives the weighted mean from these vectors instead of modifying the position of the current vector toward an optimal solution.The mean rule can be described as follows: (2) The common mathematical notation used in this paper.

X t
Position of solution at the t th iteration (generation/epoch) The value of the j th dimension of i th solution at the t th iteration

X r
The random solution (position)

T max
The maximum number of iterations

X g best
The current best solution at g th generation

X best
The global best solution (The final solution of algorithm) where g and g max represent the current generation (iteration/epoch) and maximum number of generations, respectively.r1 and r2 are random numbers generated from the ranges [0, 0.5] and [0, 1], respectively.randn is a random number generated from a normal distribution.ε is a very small number, f () is the fitness function, and t is the index of the WM.When t = 1 , then a1, a2 and a3 are different intergers chosen randomly from the [1, N] range, and w = max f (X a1 ), f (X a2 , f (X a3 )) .When t = 2 , X a1 = X best represents the best solution, X a2 = X better represents the better solution, X a3 = X worst represents the worst solution in the current iteration, and w = f (X a3 ) .After calculating MeanRule , INFO generates 2 new solutions z1 and z2 based on X best , X better , X a1 , and the current solution X i .In the case of a random number r3 < 0.5 , z1 and z2 are generated by Eq. ( 13).
(10) Vector-combining phase.In this phase, INFO combines two vectors, z1 and z2 , to improve diversity.If the random probability is less than 0.5, the new vector will be the same as the current vector.Otherwise, INFO will generate a new vector, as in Eq. ( 16).
where µ = 0.05 * randn , r5 and r6 are random numbers generated from the (0, 1) range, and is the new position of the current search agent.
Local search phase.To further enhance the search performance, INFO utilizes a local search strategy around the global best solution and the solution derived from the mean-based rule.This phase is activated for the current agent only when the probability of a randomly generated number is smaller than 0.5.The agent skips this phase if the random probability is greater than or equal to 0.5.The new position of the current solution is presented in Eq. ( 17).
where r7, r8 , r9 and ∅ are random numbers generated from the (0, 1) range.Finally, the flowchart of the INFO algorithm is presented in Fig. 4.

Runge Kutta optimization (RUN)
This section briefly overviews the Runge-Kutta optimization (RUN) algorithm 39 .The RUN algorithm is based on the Runge-Kutta method, which is commonly used in numerical methods to solve ordinary differential equations.The RUN algorithm consists of three phases: the first phase is initialization (described above), the second phase involves a search procedure based on the Runge-Kutta theory, and the third phase is enhanced solution quality (ESQ).RUN has the benefit of being simple to execute and effective; therefore, it has been successfully applied in several fields, such as photovoltaic systems 47 , optimizing hydropower reservoirs 40 , and feature selection 48 .The subsequent subsection will briefly describe the RUN algorithm's main equations and fundamental concepts; more details are available at 39 .Search mechanism-based Runge-Kutta theory.In this phase, the RUN algorithm uses a search mechanism (SM) to update the position of the current solution at each generation.It can be determined by Eq. ( 21). ( 14) where r4 is a random integer from the set {1,− 1} and r5 is a random number generated from the [0, 2] range.pr1 , r6 , r7 , and ϕ are random numbers generated from the (0, 1) range.a and b are two constant numbers.µ = 0.5 + 0.1 * randn is a random number.X g wr and X g br are the worst random and best random solutions determined based on three random solutions ( X r1 , X r2 , X r3 ) selected from the current population with condition r1 = r2 = r3 = i .The rule is defined as follows: If the fitness of X i is better than the fitness of the best solution X bi selected from ( X r1 , X r2 , X r3 ) , then we have where r8 and r9 are random numbers generated from the (0, 1) range.
Enhanced solution quality.This phase is implemented in the RUN algorithm to increase the quality of the solutions and avoid local optima in each generation.Additionally, this phase is executed only when the probability of a random number is smaller than 0.5 ( pr2 < 0.5) .The mathematical formulation can be described as Eq. 12 where β is a random number generated from the (0, 1) range.c = 5 * rand is a random number.r10 is a random integer chosen from the set {-1, 0, 1}.The w factor is used to control the exploitation and exploration process in this phase.For w ≥ 1 (at early generations), the solution X i2 tends to explore the search space.When w < 1 (at later generations), the solution X i2 tends to exploit around the X i1 position.To enhance the performance of the new solution, the RUN algorithm also utilizes the new operator when the newly generated solution X is not better than the current solution.It can be expressed using Eq. ( 36).
where r11 and r12 are random numbers in the (0, 1) range and v = 2 * rand is a random number.It is worth noting that the chance to generate X i3 occurs when pr3 < w , where pr3 is a random number in the (0, 1) range.Finally, the flowchart of the RUN algorithm is shown in Fig. 5.

Extreme learning machine (ELM)
The extreme learning machine (ELM) algorithm was developed to improve the efficiency and speed of singlehidden-layer feedforward networks (SLFNs) without the need for tuning the hidden weights 23 .Unlike the MLP (ANN or FFNN), which requires an iterative optimization process to determine the hidden weights, the ELM randomly assigns them.The weights between the hidden and output layers are then calculated using the Moore-Penrose inverse method (Fig. 6).ELM is an "extreme" approach because the hidden weights are adjusted only once during training and are randomly assigned 33 .This enables the ELM to be trained much faster than traditional neural networks and handle large amounts of data with high-dimensional input spaces.ELM has been successfully applied in a variety of fields, including environmental sciences 49 , time-series forecasting 50 , and agriculture 51 .
The most commonly used structure of the ELM is based on an MLP network 52 with an input layer, a single hidden layer and an output layer.Given a set of training pairs { x i , y i ∈ R N x R m }, where x i is the input vector with N dimensions and y i is the m-dimensional output vector, the output of neuron j th of the ELM with L hidden neurons can be expressed as follows: where G() is the activation function and W ji and b j are the weights and biases between the input and hidden layers, respectively.x i is the i th input node, and β kj is the between hidden and output layers.The training process of ELM is executed as follows: • Randomly generated the W ji and b j • Calculate the β kj by solving least squares problem Y = βH , where Y is the true output (observation val- ues), H is the output matrix of the hidden layer.β can be calculated using Moore-Penrose inverse which is

Proposed models
Training process of hybrid-based ELM models.Although the ELM employs Moore-Penrose to find the global optimal point for the least squares problem, the matrix H contains random components of the hidden weights.As these weights are generated only once during the training process and are not updated, the use of Moore-Penrose will only locate nonglobal optimal points if the initially generated weights are not globally optimal 33 .This is the main drawback of the ELM model.Therefore, in this paper, metaheuristic algorithms are proposed to update and find better sets for hidden weights of the ELM network, followed by using Moore-Penrose to calculate the output weights.Before applying the metaheuristic algorithms, it is essential to define two critical components: the representation of a solution for the optimization problem and the fitness function.These components are addressed for the proposed hybrid-based ELM model (where hybrid-based refers to the proposed algorithms such as PSS, INFO, RUN, and other algorithms that will be compared in the subsequent sections of this study) as follows: In the proposed hybrid-based ELM model, let s i , s h and s o be the sizes of the input, hidden and output layers, respectively.Vectors W and b are the weights and biases of the input and hidden layers, respectively, in the ELM network.A solution is a real vector S (solution), which includes two components: input weights and hidden biases, represented as follows.All values are generated within the range [− 1, 1]: All values are generated within the range [− 1, 1].This vector S encapsulates the necessary parameters for the ELM model, which will be optimized using the proposed metaheuristic algorithms.
In metaheuristic algorithms, the fitness function evaluates the quality of a given solution.In this study, the average root mean square error ( RMSE ) is selected as the fitness function, a common choice in hybrid-based ELM models.The RMSE is calculated by averaging the RMSE over the entire training dataset, as shown in Eq. (39).In which, Eq. ( 38) defines the RMSE based on the difference between the ground-truth value and the ELM output.
where s ts denotes the size of the training set, ŷi represents the output of the ELM network (as per Eq. ( 37)), and y i represents the corresponding ground truth value.
In this manner, the training process of the ELM network is formulated as an optimization problem aimed at minimizing the RMSE value.Similarly, the hybrid-ELM model can minimize the F( S) = RMSE function.Spe- cifically, the procedure of the proposed hybrid-based ELM models can be described as follows: The proposed www.nature.com/scientificreports/algorithm initializes a population of N solutions randomly.Each solution represents a candidate ELM network, as described above.Then, the fitness for each solution is calculated through the following steps: • Convert the solution back into input weights and hidden biases, then assign them to the ELM network.
• Compute the output weights of the ELM using the Pseudo-Inverse Method based on the training dataset.
• Evaluate the RMSE value based on the validation part of the dataset.
After calculating the RMSE value, it is returned to the fitness function of the metaheuristic algorithms.The algorithms then perform the operation steps described in the earlier sections to evolve the population.This process repeats iteratively until the stopping condition of the algorithm is met.At this point, the solution with the best performance (i.e., the lowest RMSE ) is returned to the ELM network.Consequently, we obtain the optimal and most effective ELM model.The Data Preparation component aims to collect and prepare data for the model.During the data collection phase, the Data Collection component stores data on servers and retrieves it in CSV format using SQL queries.These CSV files are then passed through the Data Preprocessing component for cleaning and preparation for the model.To use time-series data for machine learning models (i.e.neural network), the first step is to transform it to a supervised learning format using the sliding windows method.The sliding window's width represents the time series' lag values.Next, we implement the ANOVA F-test 53 to determine the model's appropriate input variables (best lagged features).This approach is widely employed in hydrology-related domains 29,54 .According to the findings of the ANOVA F-test, six lagged times are chosen as input features for one-step ahead river streamflow modeling as shown in Table 2.
where Q(t) is the target river streamflow step "one month ahead" and Q(t − 1),…, Q(t − 24) are the lag times feature up to 24 months previously.Here, six features with the highest ANOVA F-test value are selected as input variables for proposed models.And then, the data were split into two parts: training and testing.
The Modeling component is responsible for training the model, storing the best model, and evaluating the results.First, the model is built using the training set by encoding the weights and biases of the ELM network into solutions.Then, the proposed metaheuristic algorithms are used to optimize these solutions.After the training and parameter tuning processes are completed, the best model is saved for future use and evaluated on the test set for performance analysis.During the building phase, the root mean square error (RMSE) function is used as

Experiments Compared models and parameter settings
To assess the ability of the three proposed optimizers for modeling river streamflow data, 17 other well-known metaheuristic algorithms from different groups of nature-inspired algorithms were selected for comparison.All 20 metaheuristic algorithms are used to optimize the weights for the ELM network.Note that the implemented codes for all metaheuristic algorithms are available at 34 .To ensure objectivity in the comparison process, the parameters for the metaheuristic algorithms were taken from their original paper and then modified through the parameter tuning process to select the best set of parameter.Moreover, to ensure fairness, all algorithms will have a population size of 50 and a maximum number of function evaluations of 50,000.This study does not use a maximum number of iterations because many algorithms update positions and fitness multiple times within a single iteration.Therefore, the maximum number of function evaluations is necessary to ensure fairness.Table 3 shows all 20 algorithms used in the experiments and their best parameters (N/A is used to indicate that no additional parameter is required).In addition to the parameters for the above algorithms, other parameters for the model and data were set as follows: • The training and testing data split ratios are 80% and 20%, respectively, for all the models.
• The ELM network has three layers: 1 input layer, 1 hidden layer, and 1 output layer.
• The number of input nodes in the ELM network is s i (selected based on the ANOVA feature selection method mentioned above).The number of hidden nodes in the ELM network is 2*s i + 1, following the formula from previous papers.The number of output nodes is 1, and the activation function used is the ELU.
Table 2.The input features and output for tested models.

Input variables
Output variable Table 3.The tested models and hyperparameter settings.

Performance metrics
As stated above, the RMSE metric was utilized as the fitness function for the metaheuristic algorithms during the model training process.However, during the testing phase, various metrics are employed, as listed in Table 4.All of the source code for the performance metrics is available at 71 .

Results and discussion
This section presents a comprehensive set of experimental results concerning 20 metaheuristic algorithms used for optimizing the ELM network.These algorithms belong to various branches of nature-inspired algorithms.Notably, three recent math-based metaheuristic algorithms, namely, PSS, INFO, and RUN, which are included at the end of the figures and tables, were used.Moreover, the tables highlight these algorithms in bold for ease of identification.As metaheuristic algorithms are stochastic (randomly based), each model was executed 10 times, and the mean and standard deviation (std) were recorded.The dominance of the RUN-ELM in the training dataset highlights its ability to capture the intricate patterns and dynamics of streamflow, leading to more accurate and reliable predictions.The robust performance of this model across multiple evaluation metrics signifies its potential for practical application in streamflow forecasting systems, particularly in scenarios where precision and stability are paramount.

Statistical results
These findings offer valuable insights into the effectiveness of the proposed RUN-ELM model, emphasizing its role as a promising tool for optimizing water resource management strategies and mitigating potential risks associated with streamflow variations.In conclusion, the comprehensive evaluation of these models underscores the superiority of the RUN-ELM in streamflow forecasting applications, providing the way for further research and improvement of math-inspired metaheuristic approaches in hydrological modeling and water resource management environments.
Following an examine the model assessment on the test dataset, we can see that our proposed PSS-ELM model emerges as the top-performing model across two crucial performance metrics: the root mean square error (RMSE) and the Nash-Sutcliffe efficiency (NSE).The PSS-ELM model achieves an RMSE of 2.0667, indicating that it can forecast streamflow values with minimum errors.Additionally, the high NSE value of 0.8642 further underscores the model's ability to capture the variability of streamflow patterns with a notable degree of accuracy.
Intriguingly, the results reveal that the GA-ELM model exhibits the highest correlation coefficient (R) among all the models, reaching 0.9376.This signifies the robustness of the GA-ELM in establishing strong Table 4. Descriptions of the performance metrics.

Metrics Equation Characteristics
Mean absolute error MAE = Smaller is better (best = 0), range = [0, + inf) Nash-sutcliffe efficiency Greater is better (best = 1), range = (− inf, 1] Kling-Gupta efficiency Greater is better (best = 1), range = (− inf, 1] Vol.:(0123456789) linear relationships between the predicted and observed streamflow values.Notably, the PSS-ELM model closely follows, with the second-highest R value of 0.9374, reinforcing its efficacy in capturing the underlying patterns within the streamflow dataset.Furthermore, when considering the Kling-Gupta efficiency (KGE) metric, the differences in performance between the best-performing model, NRO-ELM, and PSS-ELM model are relatively small (0.9175 and 0.9148, respectively).This implies that the PSS-ELM model demonstrates remarkable efficiency in replicating the observed variability of the target variable within the test dataset.
It can be seen from Table 5 that ASO-ELM has the lowest performance across multiple metrics.It exhibits relatively higher MAE, RMSE, MAPE, and lower R, NSE, and KGE values compared to many other models listed.On the other hand, the PSS-ELM model performs competitively in terms of most metrics.It achieves relatively low MAE and RMSE values, high R value, and a significantly lower MAPE compared to many other models, indicating its superior accuracy in streamflow prediction.Additionally, its NSE and KGE values are also relatively high, indicating good agreement with observed data.
The superior performance of the PSS-ELM model across the RMSE and NSE metrics signifies its ability to minimize prediction errors and accurately capture overall streamflow dynamics.The high correlation coefficient (R) values attained by both the PSS-ELM and GA-ELM models reflect their capacity to establish robust linear relationships, which is crucial for precise streamflow predictions.Moreover, the small disparity in KGE values between NRO-ELM and PSS-ELM further highlights the commendable performance of the PSS-ELM model in replicating the observed variability and overall characteristics of streamflow in the test dataset.This suggests that the PSS-ELM model exhibits a strong ability to adapt to varying streamflow conditions, demonstrating its reliability and effectiveness in streamflow forecasting tasks.
In summary, the results underscore the superior performance of the PSS-ELM model in capturing the intricate dynamics of streamflow, as evidenced by its top rankings in terms of the RMSE and NSE metrics on the test dataset.The model's ability to closely follow the correlation coefficient (R) with the GA-ELM model and its competitive KGE value further solidify its position as a robust and accurate tool for streamflow forecasting.These findings illuminate the potential of the PSS-ELM model as a valuable asset in water resource management, offering reliable insights into streamflow behavior and aiding decision-making processes for sustainable water allocation and mitigation strategies.

Model stability
Model stability, as indicated by low standard deviation (std) values in forecast performance indicators, is critical in real-world river flow prediction applications.Low std values suggest that the model's predictions are consistently near to actual values throughout various scenarios, which is important for stakeholders who rely on these predictions for decision-making because it decreases the likelihood of unexpected variances and errors.In realworld applications, particularly hydrology, accurate predictions are critical for managing water resources, and guaranteeing the effective operation of hydropower facilities.A model with low standard deviations is more reliable, giving confidence in operational decisions based on its forecasts.Furthermore, a low standard deviation in www.nature.com/scientificreports/model performance indicates that the model retains its accuracy in a variety of hydrological situations, including changing rainfall patterns, seasonal variations, and extreme events.This resilience is especially crucial for managing and planning for climate variability and change.Moreover, accurate and steady estimates help optimize the allocation and use of water resources, minimizing waste and ensuring that water supplies meet demand efficiently.
To further investigate the stability of the models, an examination of their standard deviation (std) values following 10 independent runs offers valuable insights into the consistency of their performance during training and prediction phases.Table 6 provides a detailed overview of the std values across six key performance metrics, shedding light on the variability in model outcomes.The std values serve as a measure of dispersion, indicating how much the performance metrics deviate from the mean across multiple runs.Lower std values suggest more stable and consistent performance, whereas higher values indicate greater variability in the results.Upon analyzing the std values, notable observations emerge regarding the stability of the models on both the training and testing datasets.The SMA-ELM model, for instance, demonstrates remarkable stability on the training dataset, boasting the four best std values for metrics such as the root mean square error (RMSE), Pearson's correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE).This suggests that the SMA-ELM model consistently produces reliable results across different training runs.
In contrast, the DMOA model emerges as the most stable on the test dataset, showing the three best std values for the metrics of mean absolute error (MAE), RMSE, and NSE.This indicates that the DMOA model maintains a consistent level of performance when applied to unseen data, reinforcing its reliability in real-world predictive scenarios.However, it is essential to note that stability alone does not guarantee effective predictive performance.Despite their stability, both the SMA-ELM and DMOA models exhibit suboptimal predictive capabilities, as highlighted in Table 6.This emphasizes the need for a balanced assessment that considers both stability and predictive accuracy in model selection.Our proposed RUN-ELM model emerges as a promising candidate, demonstrating the best std value for the mean absolute percentage error (MAPE) metric on both the training and testing datasets.This indicates that the RUN-ELM model maintains consistent and reliable performance across multiple runs, particularly in terms of percentage error.
Additionally, the proposed INFO-ELM model shows commendable std values on the test dataset, positioning it favorably among the compared models.For instance, while the DMOA-ELM model exhibits the best std value of 0.0264 for the MAE, the INFO-ELM model closely follows, with a value of 0.0285, ranking second.Similarly, for the KGE metric, the best model records a std value of 0.0057 (GBO-ELM), with INFO-ELM securing a value of 0.007, ranking second.In summary, the proposed INFO-ELM and RUN-ELM models demonstrate notable stability across multiple performance metrics compared to the other models under consideration.These results highlight the significance of assessing mean performance as well as model stability through standard values, offering both scholars and practitioners important information in selecting solid and consistent models for particular applications.
To assess the stability and performance of the models, box plot visualization was used for the evaluation metrics, including the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage  Outliers, represented by hollow circles outside the main body of the box, are individual data points that fall significantly away from the majority of the dataset.These outliers, while not typical of the general trend, can provide valuable insights into unusual or extreme cases within the model performance.By presenting these box plots for each evaluation metric, a more comprehensive understanding of not only the central tendencies but also the variability and distribution of model performance across the trials has been introduced.This graphical representation allows for a nuanced comparison of the models, highlighting their strengths and weaknesses across a range of performance measures.
Overall, the use of box plots in this context provides a robust and visually intuitive method for assessing the stability, consistency, and overall performance of the compared models.These figures serve as a valuable resource for researchers and practitioners in making informed decisions regarding the selection and implementation of models for the studied application.

Model convergence
This study also compared the model convergence during the training process.Figure 14 shows the error results after 50,000 NFE for the compared algorithms in several experimental runs.It should be noted that some algorithms in each iteration call the fitness function more than once for each search agent.Thus, some algorithms, such as the GSKA-ELM and NRO-ELM, execute fewer than 1000 iterations to reach the 50,000 NFE limit.In contrast, other algorithms that only call the fitness function once for each search agent in each iteration will perform 1000 iterations (50,000/50 = 10,000, population size = 50).
In general, our study revealed that the proposed INFO-ELM and RUN-ELM algorithms achieved more remarkable convergence than did the other algorithms.These algorithms achieved a very fast convergence speed after approximately 30 iterations.Although the convergence speed subsequently decreased, it still had the potential to continue decreasing rather than plateauing, similar to other algorithms such as ASO-ELM, HGSO-ELM, and CRO-ELM.

Results comparison with previous study
Finally, we compare the performance of the proposed PSS-ELM model against established models, particularly the MLP model integrated with various physics-inspired metaheuristic algorithms as explored in 43 .Among these algorithms, Nuclear Reaction Optimization (NRO) exhibited promising results with an MLP model in forecasting streamflow, yielding RMSE = 2.35, MAE = 1.356, and MAPE = 16.747.However, our proposed PSS-ELM model surpasses the performance of NRO-MLP, achieving notably higher accuracy levels with RMSE = 2.0667, MAE = 1.2127, and MAPE = 0.1660.It can be seen that the improvement in MAPE from the NRO-MLP model to the proposed PSS-ELM model is substantial, reducing from 16.747 to 0.1660%.
In general, our proposed models (PSS-ELM, INFO-ELM, and RUN-ELM) offer valuable insights for decisionmakers involved in water resource planning by the performance it demonstrates in terms of superior accuracy, stability, and convergence.The findings also highlight how the enhanced predictive capabilities of the proposed model can lead to more informed and effective management strategies, thereby mitigating the impacts of extreme events.

Limitations and future directions
While our proposed PSS-ELM, INFO-ELM, and RUN-ELM models demonstrate promising performance in river streamflow prediction, it's essential to acknowledge potential limitations and biases that could impact their predictions and practical applications.
• Data limitations: One potential obstacle is the quality and quantity of input data.Despite efforts to use comprehensive streamflow data from the Aswan High Dam, historical data at monthly intervals spanning 130 years may miss short-term variations or localized phenomena.This constraint could affect the models' capacity to reliably capture quick changes or intense events, especially in places with limited data availability.• Model assumptions: Our model, like other modeling methods, uses metaheuristic algorithms, thus some parameters must be calibrated.Assumptions about these features may be good for this case study but not for others, especially for data types with complex relationships and nonlinear dynamics found in hydrological systems.Biases towards parameters that are beneficial to the model might cause disparities between model predictions and observed results.Furthermore, ELM-based models are shallow neural networks with a simplistic structure that may fail to capture complex correlations in hydrological data with many constraints.• Generalization to different conditions: Another potential limitation is the generalizability of our models to other meteorological, geographical, and hydrological situations.While we validated our models using data from the Aswan High Dam, their performance in other areas with different hydrological regimes and These limitations and biases may have significant implications for model predictions and their practical applications in water resource management and flood risk mitigation, such as:  • Transferability problems: Generalizing model predictions to different scenarios or places may be difficult due to biases caused by changes in environmental variables.Careful validation and adaption of the models to unique circumstances are required to assure their dependability and usability in practical applications beyond the study domain.
Considering the limitations mentioned above, future directions could focus on various areas to increase the robustness and practical use of the models: • Parameter Optimization: Develop methods for optimizing model parameters so that they are more robust and adaptive to changing conditions, taking into account the complex interactions and nonlinear dynamics found in hydrological systems.• Model Generalisation: Research methods aimed at improving the models' generalization capability across a wide range of climatic, geographic, and hydrological scenarios.This could require incorporating more diverse datasets and accounting for fluctuations in environmental variables.• Regional Validation: Thoroughly validate the models in locations with different hydrological regimes and environmental factors to assess their performance and reliability beyond the original study area.• Enhanced Model Complexity: Investigate the possible advantages of increasing model complexity, such as utilizing more advanced neural network designs or incorporating extra environmental variables, in order to better depict the complexity of hydrological systems.

Conclusion
In this study, novel models combining three advanced math-inspired metaheuristics: Pareto-like sequential sampling (PSS), weighted mean of vectors (INFO), and Runge-Kutta optimization (RUN) with an extreme learning machine (ELM) for streamflow forecasting tasks are introduced as effective approaches.These approaches address the limitations of random weights in the matrix H of the ELM network, which are set once during training and remain unchanged.Consequently, relying solely on the Moore-Penrose inverse operation may lead to suboptimal solutions if the initial weights do not represent the global optimum.To mitigate this challenge, new algorithms such as PSS, INFO, and RUN, known for their gradient-free nature, capacity to escape local optima, and global optimization capabilities, have recently been developed.
To assess the performance of these hybrid methods, 20 models have been evaluated across eight primary metaheuristic categories: evolutionary-inspired, swarm-inspired, human-inspired, physics-inspired, musicinspired, biology-inspired, system-inspired, and math-inspired.Utilizing a dataset of streamflow data from the Aswan High Dam (AHD) spanning 130 years at monthly intervals-a crucial resource for water management.Our findings indicate that the prediction accuracy, model convergence, and stability of the ELM combined with math-inspired methods surpassed those of other methods.Specifically, our proposed PSS-ELM model demonstrated superior accuracy in river streamflow prediction with a Pearson's correlation index (R) of 0.9374, root mean square error (RMSE) of 2.0667, and a Nash-Sutcliffe efficiency (NSE) of 0.8642.Furthermore, the INFO-ELM and RUN-ELM models exhibited enhanced stability across ten runs, showing lower mean absolute percentage errors (MAPE) of 15.21% and 15.28%, respectively, as well as better values of mean absolute errors (MAE) with 1.2145 and 1.2105, respectively, and higher values of Kling-Gupta efficiencies (KGE) at 0.9113 and 0.9124, respectively.These results underscore the potential effectiveness of our proposed PSS-ELM, INFO-ELM, and RUN-ELM models in natural streamflow forecasting systems.
While our models demonstrated robust performance in river streamflow forecasting, it is crucial to acknowledge that this study represents a single case under specific climatic conditions.Further investigations across diverse climatic scenarios and regions are essential to comprehensively evaluate the models' effectiveness.Thus, our future research will aim to apply these models to various case studies characterized by distinct climatic conditions, broadening the scope of our analysis in water resource-related applications.Besides, future work can be carried out by the deployment of the developed model in real-time forecasting scenarios.This will further demonstrate the practical utility and effectiveness of our approach in addressing real-world challenges in water resource management.

Figure 1 .
Figure 1.The location of the case study Aswan High Dam (AHD).

Figure 2 .
Figure 2. Natural monthly streamflow of the Nile River in the BCM.
Proposed hybrid-based ELM.This section provides a detailed description of our proposed model, illustrated in Fig.7.Our model consists of two main components: Data Preparation and Modeling (Training and Testing).

NNi=1 (ŷ−y) 2 N
Smaller is better (best = 0), range = (− inf, + inf) Root mean square error RMSE =Smaller is better (best = 0), range = (− inf, + inf)Pearson's correlation coefficient Greater is better (best = 1), range = (− inf, error (MAPE), Pearson's correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE).These figures, namely, Fig.8, Fig.9, Fig.10, Fig.11, Fig.12, and Fig.13, offer a visual representation of the results obtained from 10 independent trials of the model comparisons based on RMSE, MAE, MAPE, R, NSE, and KGE metric, corresponding.Box plots are a valuable tool in statistical analysis, providing a comprehensive view of the distribution and variability of the data.In each box plot, the box itself spans from the first quartile (Q1) to the third quartile (Q3) of the data, representing the middle 50% of the dataset.The line within the box denotes the median, which is the midpoint of the data series.Moreover, a green triangle symbol has been included within the box plots to indicate the mean value of each metric across the 10 trials.This offers a quick reference point for the central tendency of the results.It is important to note that the mean can be influenced by extreme values, so the box plot also highlights any outliers in the data.

Figure 8 .
Figure 8.The RMSE results of all compared models after 10 trials (box plot).

Figure 11 .
Figure 11.R results of all compared models after 10 trials (box plot).

Figure 12 .
Figure 12.NSE results of all compared models after 10 trials (box plot).

Figure 13 .
Figure 13.KGE results of all compared models after 10 trials (box plot).

Figure 14 .
Figure 14.The convergence lines of the compared models in different trials.

Table 5
encapsulates the mean performance values derived from 20 models following 10 iterative trials conducted on both the training and testing datasets.The evaluation of these models relied on six essential performance metrics: mean absolute error (MAE), root mean square error (RMSE), correlation coefficient (R), mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE).By analyzing the training dataset, it becomes evident that our proposed RUN-ELM model shows superior performance across all the considered metrics.This model exhibits the lowest values for the MAE, RMSE, and MAPE, underscoring its precision in capturing the discrepancies between the predicted and observed streamflow values.Moreover, the RUN-ELM yields the highest correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE) values.These results illuminate its capacity not for accurate prediction but also for maintaining consistency and robustness across varying streamflow scenarios.Conversely, several models, namely, HGSO-ELM, ASO-ELM, GSKA-ELM, LCO-ELM, SMA-ELM, SOA-ELM, TSA-ELM, and AEO-ELM, demonstrate comparatively inferior performance metrics on the training dataset.These models are characterized by elevated values of MAE, RMSE, and MAPE, indicating a larger magnitude of prediction errors.Furthermore, they exhibit lower correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE) values, suggesting diminished accuracy, consistency, and model efficiency.

Table 5 .
The mean results of all compared models after 10 trials on various performance metrics.Significant values are in bold.

Table 6 .
The standard deviation (std) results of all compared models after 10 trials on various performance metrics.Significant values are in bold.