Improving PM2.5 prediction in New Delhi using a hybrid extreme learning machine coupled with snake optimization algorithm

Fine particulate matter (PM2.5) is a significant air pollutant that drives the most chronic health problems and premature mortality in big metropolitans such as Delhi. In such a context, accurate prediction of PM2.5 concentration is critical for raising public awareness, allowing sensitive populations to plan ahead, and providing governments with information for public health alerts. This study applies a novel hybridization of extreme learning machine (ELM) with a snake optimization algorithm called the ELM-SO model to forecast PM2.5 concentrations. The model has been developed on air quality inputs and meteorological parameters. Furthermore, the ELM-SO hybrid model is compared with individual machine learning models, such as Support Vector Regression (SVR), Random Forest (RF), Extreme Learning Machines (ELM), Gradient Boosting Regressor (GBR), XGBoost, and a deep learning model known as Long Short-Term Memory networks (LSTM), in forecasting PM2.5 concentrations. The study results suggested that ELM-SO exhibited the highest level of predictive performance among the five models, with a testing value of squared correlation coefficient (R2) of 0.928, and root mean square error of 30.325 µg/m3. The study's findings suggest that the ELM-SO technique is a valuable tool for accurately forecasting PM2.5 concentrations and could help advance the field of air quality forecasting. By developing state-of-the-art air pollution prediction models that incorporate ELM-SO, it may be possible to understand better and anticipate the effects of air pollution on human health and the environment.

In recent years, developing accurate and reliable models for predicting PM 2.5 concentration has become an essential area of research that can help mitigate health risks associated with air pollution and improve public health 12 .Moreover, in literature, various modeling approaches, including chemical transport, statistical, machine learning, and deep learning models [13][14][15] have been applied to predict PM 2.5 concentration.Chemical transport models simulate the physical and chemical processes governing the transport and transformation of atmospheric pollutants 16 .The concentration of PM 2.5 may be predicted using these models utilizing information on emissions, weather, and other environmental conditions.However, these models are computationally intensive and require high expertise to develop and run.Comparatively, the statistical models are simpler and more computationally efficient 17 .These models use statistical techniques to relate PM 2.5 concentration to environmental variables such as temperature, humidity, and wind speed.However, these models are limited by their assumptions about the relationship between variables and may need to perform better under changing environmental conditions.In contrast to the abovementioned techniques, machine learning, and deep learning models use algorithms to identify patterns in large datasets.These models can be trained on historical data to predict PM 2.5 concentration based on environmental variables 18 .Machine learning models have the advantage of being able to identify complex, nonlinear relationships between variables, and they can adapt to changing environmental conditions.However, a substantial amount of high-quality training data is needed to create and improve these models.
Machine Learning (ML) and Deep Learning (DL) models, such as Random Forest (RF), Long Short-Term Memory networks (LSTM), Support Vector Regression (SVR), Extreme Learning Machines (ELM), Extreme Gradient Boosting (XGBoost), Gradient Boosting Regression (GBR), etc., have been widely used to forecast PM 2.5 concentrations.These models use historical data on air quality, meteorology, and traffic conditions to predict future PM 2.5 concentrations.In a research done in Delhi, India, the RF model was applied to multi-stage modeling exercises using satellite data, land use factors, reanalysis-based meteorological variables, and population density to estimate PM 2.5 concentrations 14 .The model outcomes indicated that the RF Model exhibited high prediction accuracy with yearly average concentrations ranging from 87 to 138 g/m 3 .Other researchers have reported comparable findings in applying ML and DL models to forecast PM concentrations for countries China, India, and the US 19 .In one of the studies, a DL model was developed for Delhi to forecast PM 2.5 concentrations based on weather information and satellite images.The study found that the LSTM model outperformed other machine learning models statistically 12 .Moreover, a SVR model was applied for a study based on the city of Nottingham, UK to forecast PM 2.5 concentrations using weather information and satellite data.The study found that the Multiple Linear Regression (MLR) and SVR models had good predictive performance compared to DL models 15 .In a similar study, an ELM model was utilized to forecast PM 2.5 concentrations in London.It was concluded that the ELM outperformed the other conventional models, such as Artificial Neural Networks (ANN), regarding forecasting accuracy and performance 18 .Altogether, it can be proclaimed that the machine learning and deep learning models have shown promise in accurately forecasting PM 2.5 concentrations.
Hybrid machine-learning algorithms have been extensively developed in the last decade for predicting PM 2.5 concentrations [20][21][22] .These models have several advantages over standalone machine learning models for forecasting PM 2.5 concentrations.These models have proven to enhance various critical aspects of performance, including accuracy, robustness, generalizability, and interpretability 23 .However, the process of choosing these models is inherently influenced by the unique requirements and constraints specific to each application.Compared to standalone models, hybrid machine learning models have demonstrated high flexibility, increased interpretability, and improved performance in terms of forecasting PM 2.5 concentrations 24,25 .This is because they combine the strengths of multiple algorithms and can capture complex relationships between variables that a single algorithm may miss 26 .Since hybrid machine learning models can adapt to changes in the data by adjusting the weights of the algorithms, they are more robust to changes in input data than standalone models.For example, if there is a sudden change in meteorological conditions, a hybrid model can adjust the weights of the algorithms better to capture the impact of these changes on PM 2.5 concentrations.Hybrid machine learning models are more generalizable than standalone models.This may be due to their ability to be trained on data from multiple sources, which allows them to capture the variability in PM 2.5 concentrations across different regions.Hybrid machine learning models can be more interpretable than standalone models because they combine more interpretable algorithms, such as regression models, with less interpretable algorithms, such as neural networks.
Some researchers introduced the VAR-XGBoost model, combining Vector Autoregression (VAR), Kriging, and XGBoost for precise, continuous O 3 concentration prediction 27 .The model was evaluated with ten-fold crossvalidation and outperformed other models, including XGBoost, CatBoost, ExtraTrees, AdaBoost, RF, Decision Tree, Light Gradient Boosting Machine, etc.The study highlighted ozone's strong correlation with PM 2.5 and weak correlation with SO 2 , using China as a case study.Besides, other studies found that the meta-algorithms significantly improved the performance of the forecasting models hence obtaining higher forecasting results 28 .Furthermore, an innovative model combining Wavelet Transform (WT), Stacked Autoencoder (SAE), and Long Short-Term Memory (LSTM) has been introduced by some researchers for precise PM 2.5 prediction 29 .The outcomes indicated that the predictive capability of SAE-LSTM surpasses that of other models, such as Back Propagation (BP), employed for comparison.This indicated that the hybrid model exhibited considerable promise in enhancing the accuracy of PM 2.5 forecasts.In another study on developing a hybrid model, the scholars developed the CNN-LSTM model by merging the strengths of the "Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM) neural network" to forecast PM 2.5 concentration in Beijing over the next 24 h 30 .This approach optimized CNN for effective feature extraction related to air quality, while LSTM captured the extended historical context of the input time series data.For comparative analysis, four models, including univariate and multivariate LSTM and univariate and multivariate CNN-LSTM, were constructed for PM 2.5 prediction.The outcomes revealed that the multivariate CNN-LSTM model excelled in terms of both low error rates and shorter training times, making it the most effective choice for PM 2.5 concentration forecasting.Besides, in yet another study to enhance the accuracy of PM 2.5 predictions, a hybrid model, CEEMDAN-COOT-VMD-JAYA-LSSVM, was introduced 31 .The research incorporated complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), variational mode decomposition optimized using the COOT algorithm (COOT-VMD), and least square support vector machine optimized by the JAYA algorithm (JAYA-LSSVM).In comparison to both single and hybrid models, the results of DM tests provided robust evidence of the superiority of the proposed model, achieving a 99% confidence level over the comparison models.Such investigation have allowed researchers to understand the relationships between variables contributing to PM 2.5 concentrations [32][33][34] .
Researchers have traditionally focused on using either DL or hybrid models for predicting PM 2.5 pollution levels, as both approaches demonstrate strong performance in simulating air pollution dynamics.However, previous studies examining air pollution prediction do not directly compare these two predictive models.Also, developing accurate and reliable models is crucial to mitigate health risks associated with PM 2.5 concentrations in Delhi, where elevated levels of PM 2.5 pollution cause significant annual fatalities and contribute to a complex interplay of factors that influence public health outcomes.This research proposes a novel hybrid machinelearning model called ELM-SO (Snake Optimization-Extreme Learning Machine) for forecasting PM 2.5 concentrations.The ELM-SO model combines two powerful machine learning techniques, ELM, and the Snake Optimization algorithm, to enhance the accuracy of PM 2.5 predictions.ELM is chosen due to its superiority over traditional Artificial Neural Networks (ANNs) as it trains faster, avoids local minima, and exhibits exceptional performance 35 .Furthermore, the Snake Optimization algorithm is employed to train the ELM, as it has proven effective in solving various engineering problems and outperforms other established and newly developed algorithms (e.g., Moth-flame Optimization, Harris Hawks Optimizer, Whale Optimization Algorithm, and others) 36 .The proposed model incorporates air quality inputs and meteorological parameters to capture the complex relationships between PM 2.5 concentrations and environmental factors.To assess the prediction performance of the ELM-SO model, the study aims to validate its performance by comparing it against the standalone machine learning models such as Support Vector Regression (SVR), Random Forest (RF), Extreme Learning Machines (ELM), Extreme Gradient Boosting (XGBoost), and Gradient Boosting Regression (GBR).
This study makes a significant and novel contribution to environmental engineering and climate change research.It diverges from previous work that mainly focused on deep learning or hybrid models for PM 2.5 pollution prediction by directly comparing these two approaches.Findings from the present investigation are immensely useful due to their practical implications.First, the findings from this study will contribute to the focused understanding of air quality dynamics, allowing for better comprehension of the complex relationships between environmental factors and PM 2.5 pollution.This knowledge is essential for environmental engineers, policymakers, and public health officials to design effective strategies for mitigating air pollution and its health effects.Secondly, the proposed ELM-SO model, which enhances the accuracy of PM 2.5 predictions, can be applied in real-time air quality forecasting systems.This means that residents, especially those in highly polluted areas like Delhi, can receive timely and accurate information about air quality.This information enables individuals to take protective measures, such as wearing masks or staying indoors during high pollution episodes, thereby safeguarding their health.Furthermore, decision-makers can use these findings to formulate evidence-based policies and regulations for reducing air pollution.By understanding the factors that influence PM 2.5 concentrations, cities and governments can implement targeted measures to limit pollution sources, such as industrial emissions or vehicle traffic.This, in turn, will prove beneficial to both individuals and communities in their endeavors to alleviate the detrimental effects of compromised air quality.

Study area and available data
The metropolitan city of Delhi, located in northeastern India, is one of the most rapidly expanding urban regions globally.The city spans an area of 1483 km 2 and is situated within the geographic coordinates of north latitudes 28° 0.21′ to 28° 0.53′ and East longitudes 76° 0.20′ to 76° 0.37′, with an elevation of 216 m above mean sea level.Boasting a population of 16.75 million, the city experiences severe environmental stress in the form of poor air quality, putting its inhabitants' lives at risk 12 .The region has sub-tropical semi-arid (steppe) climatic conditions, characterized by an average annual precipitation of 611.8 mm and a mean annual temperature of 31.5 °C.Due to a combination of factors, including distinct geography, meteorological conditions, and rapid urban expansion, Delhi experiences some of the poorest air quality in the northern hemisphere.The city usually encounters an average of 15 pollution episodes each year, with over 200 days in a year exceeding the National Ambient Air Quality Standards (NAAQS) limits for PM 2.5 . This high frequency of pollution episodes and PM 2.5 exceedances underscores the severity of the air pollution problem in the city.A total of 731 observations of daily averaged air quality and meteorological data from 2015 to 2018 (covering a 4-year period) were obtained from the RK Puram (air quality monitoring station) and Safdarjung airport (meteorological station) (Fig. 1).Also, the meteorological parameters, such as wind speed, evaporation, air temperature, and rainfall, for the studied location are obtained from NASA's open-source Data Access Viewer (DAV) available at https:// power.larc.nasa.gov/ data-access-viewer/.The air quality data procured often contains missing and corrupted values resulting from factors such as natural disasters, sensor shutdowns, or system crashes.To address this issue, we utilized a multidirectional imputation technique presented by 38 to estimate pollutant concentrations for the missing values.Following the imputation process, we performed Z-score normalization to eliminate outliers from the data.This normalization procedure transforms the data into a standardized distribution with a mean of 0 and a standard deviation of 1.The following equation has been used to compute the Z-score normalization: ( where Z is the normalized value, X is the input, X is the mean value of the dataset and σ denotes the standard deviation.The characteristics of both the training and testing data sets have been presented in Table 1.

ML techniques
SVR Support Vector Regression (SVR) is a machine learning algorithm that can be used to predict PM 2.5 levels based on a set of input parameters.Using a kernel function, the algorithm first transforms the input parameters into a higher-dimensional space 39,40 .In this new space, the algorithm then searches for a hyperplane that maximizes the margin between the predicted and the actual PM 2.5 values in the training data (Fig. 2).To make predictions on new data, SVR maps the input parameters into the same higher-dimensional space as the training data and predicts the PM 2.5 level based on its position relative to the hyperplane.The optimal hyperplane is determined by minimizing a cost function that penalizes errors in the predicted PM 2.5 levels.The input parameters used in the SVR model for PM 2.5 prediction include temperature, wind speed, rainfall, evaporation, humidity, PM 10 , NO, NO 2 , NH 3 , benzene, and Toluene.These parameters are used to create a model that can accurately predict PM 2.5 levels based on environmental conditions.The SVR can be formulated as follows: where Y is the predicted value of PM 2.5 .b is the bias term.a i is the coefficient of each input data point in the model.x i is the input data point.K(x i , x) is the kernel function used to compute the similarity between the input data point x i and the new data point x.

RF
Random Forest (RF) is a popular ensemble learning technique used for both regression and classification tasks.It functions by creating numerous decision trees during the training phase and determining the final prediction based on the mode of the classes (for classification) or the average prediction (for regression) from the individual trees 41,42 .In the RF model, each tree is constructed using a random subset of the training data and a random subset of the input features.The model then aggregates the predictions of all the individual trees to produce a final outcome 43 .

LSTM
The proposed model utilizes a Short-Term Long Memory (LSTM) neural network, which is a type of recurrent neural network (RNN), to predict PM 2.5 concentrations based on input variables, including temperature, wind speed, rainfall, evaporation, humidity, PM 10 , NO, NO 2 , NH 3 , benzene, and Toluene.LSTM networks can handle temporal dependencies in the input data, making them suitable for prediction tasks such as air pollutant concentration forecasting 44 .The LSTM model's configuration is determined by its number of LSTM cells, hidden layers, and input/output dimensions.These LSTM cells are connected to both the input layer and each other through weighted connections, which are adjusted during the training phase.During this stage, the model is trained on a dataset using mean squared error (MSE) as the loss function, with the backpropagation algorithm updating the LSTM cell weights and optimizing the loss function.The model is then evaluated on a validation set using mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (R 2 ) as evaluation metrics.The trained LSTM model can then be used to predict the PM 2.5 concentrations for new input data.

XGBoost
Extreme gradient boosting, or XGBoost, is a highly potent machine learning technique that relies on decision tree algorithms as its foundational unit of analysis.The technique generates a series of more powerful decision trees based on the forecasting inaccuracies of the preceding tree model.Unlike other ML models, XGBoost offers a higher level of complexity due to its numerous tunable parameters.While it shares some parameters with other tree-based models, XGBoost also necessitates additional hyperparameters.These hyperparameters are strategically designed to mitigate overfitting risk, minimize prediction variability, and thereby enhance overall prediction accuracy.XGBoost has the merit of effectively managing overfitting through the use of regularization which speeds up the model development process.

GBR
Gradient boosting regression tree is a machine learning approach that utilizes an ensemble of weak learners, usually decision trees, for forecasting.The algorithm iteratively enhances its predictions by adding new decision trees to the ensemble, with each tree trained to rectify the errors made by the preceding trees, resulting in increasingly accurate forecasts.Gradient boosting regression employs shallow decision trees as weak learners with minimal splits to prevent overfitting on training data.The technique can depict non-linear relationships, Figure 2. Support vector machine having an optimal hyper plane for classification.
particularly those observed in various environmental systems, and deploys a variety of differentiable loss functions to learn throughout the iterations involving input features.

ELM
Extreme Learning Machines (ELM) is a machine learning approach used for predicting PM 2.5 values in this study.
It is a feedforward neural network with a simple structure and fast processing operation 35 .During training, ELM uses many input neurons randomly selected from the available input features 45 .These input neurons are then connected to the hidden layer through randomly generated weights (Fig. 3).The output layer of the network is trained using the Moore-Penrose generalized inverse, which allows for fast and efficient training 46 .ELM is highly effective in predicting PM 2.5 values, with superior performance compared to other machine learning methods.Additionally, ELM is highly scalable and capable of handling large amounts of data, making it an ideal choice for PM 2.5 prediction tasks.
Given a set of training data {(xi, yi)} with xi ∈ Rn and yi ∈ R, ELM seeks to learn a mapping function f(x) = beta*(φ(x*w T )), where w T and φ(x) are randomly generated weight and bias values and a nonlinear mapping function, respectively.Also, beta is output ELM weights.The mapping function φ(x) transforms the input features x into a high-dimensional feature space, where the nonlinear classification can be performed using a linear model with the weight vector beta (see Figure 3).In this work, sigmoid transfer function is used.The output weights are then calculated using the Moore-Penrose generalized inverse 47 : Where H is the hidden layer output matrix, and y is the target output matrix, The trained ELM model can then be used to predict the PM 2.5 values for new input data.

ELM-SO
The ELM-SO hybrid model combines the Extreme Learning Machines (ELM) algorithm with the Snake Optimization (SO) algorithm to improve the accuracy of PM 2.5 prediction.The ELM-SO algorithm first uses the ELM algorithm to create a high-dimensional feature space and then applies the SO algorithm to select the optimal parameters of ELM for PM 2.5 prediction.The SO algorithm is a meta-heuristic optimization algorithm that mimics the behavior of snakes searching for prey.The algorithm searches for the optimal subset of features by iteratively adjusting the snake's position in the search space.The following are the series of steps involved in the SO algorithm: In the Snake Optimization Algorithm (SOA), the initialization phase involves generating a random population distributed uniformly within the search space 36 .This step allows the algorithm to commence the optimization process.The initial population is obtained using the following equation: where P j is the position of jth element, R represents a random number between 0 and 1, P max and P min establish the lower and upper limits of the problem, defining its respective boundaries.
(2) Splitting of the population into two equal fractions: males and females In this step, an assumption is made where an equal distribution of males and females, each comprising 50% of the total population is carried out.The population is then split into two groups: a male group and a female group.To perform the division of the swarm, we employ the following equations: where I represents the total number of individuals, I m refers to the count of male individuals, and I f cor- responds to the count of female individuals.
(3) Assessing the two groups and establishing the optimal temperature and food quantity requirements • Identify the top individual in each group and determine the best male, best female, and their respective positions in the food hierarchy • The temperature (T) can be expressed by employing the following expression In the above equation, i c represents the current iteration, while i T corresponds to the maximum num- ber of iterations being considered.
• The Food quantity(F q ) can be determined by using the following expression: where, C 1 represents a constant whose value is considered as 0.5 (4) Stage of Exploration (Lack of food) If F q < Threshold, snakes randomly search for food and update their position accordingly using the fol- lowing equation:.
Here, P j,m denotes the position of the jth male, while P rand,m denotes the position of a randomly selected male, rand is a random number ranging from 0 to 1, while M ab denotes the male's ability to locate food.Similarly for the females, the position is updated with the following expression.
Here, P j,f represents the position of the jth female, while P rand,f is the position of a randomly selected female, rand is a random number ranging from 0 to 1, while F ab denotes the female's ability to locate food.
The ability of males or female individuals to look for food is given by the expression: where, f rand,m/rand,f indicates the fitness of P rand,m / P rand,f , f j,m/j,f represents the fitness of the jth individual in the male or female group, and C 2 is a constant with a fixed value of 0.05.(5) Stage of Exploitation (Food is present).
If F q and T > Threshold, snakes only look for food and update their position accordingly using the fol- lowing equation.
where,P j,k indicates the individual location (male or female),P food represents the best individuals position, and C 3 is a constant with a fixed value of 2.
T < Threshold (0.6) triggers the fighting or mating mode of the snake.
• Fight mode (4) P j = P min + R × (P max − P min ) Here, P j,m represents the position of the jth male, while P best,f is the best individual female's position, rand is a random number ranging from 0 to 1, while MF ab denotes the male's ability to fight. Similarly, Here, P j,f represents the position of the jth female, while P best,m is the best individual male's position, rand is a random number ranging from 0 to 1, while FF ab denotes the female's ability to fight.

• Mating mode
Here,P j,m and P j,f represents the position of the jth male and female, while MM ab and FM ab denote the male and female's ability to mate.
When an egg hatches, the least performing male and female are selected and replaced Here,P worst,m and P worst,f denote the worst performing male and female of a population group.
The ELM-SO hybrid model offers several advantages over other prediction models, such as improved accuracy, reduced computational complexity, and increased noisy and missing data robustness.The combination of the ELM algorithm and the SO algorithm enables the model to efficiently identify the most informative features for PM 2.5 prediction, resulting in more accurate and reliable predictions.

Model development
This study developed a hybrid model called ELM-SO to predict PM 2.5 levels at a critical station in India.This was achieved by combining the ELM and SO techniques with SO used to optimize the weights and biases of the ELM's hidden layer.In the proposed hybrid model, ELM-SO, the Snake Optimization (SO) algorithm is employed to optimize the parameters of the ELM with the objective of minimizing the root mean square error (RMSE) as the objective function.Initially, SO initializes the ELM parameters with random numbers.The algorithm then iteratively corrects and refines these parameters based on the mating behavior-inspired exploration and exploitation phases.Through dynamic adjustments of parameters, such as food quantity, temperature, and mating abilities, SO aims to reduce the objective function and improve the accuracy of the ELM model.This iterative optimization process enhances the robustness and adaptability of the ELM-SO hybrid model in predicting PM 2.5 levels at the specified station in India.
The hybrid model (ELM-SO) was validated against several other methods, including LSTM, RF, SVR, XGBoost, GBR and classic ELM.The data were divided into two stages: training, which involved using 75% of the total data to train the models and compute the best model parameters, and testing, which was used to check the model accuracy and select the best model.After dividing the data, the predictors and their corresponding values for PM 2.5 are normalized between zero and one to improve the learning process of the ML models.In the case of the hybrid model, the SO is utilized to optimize the model's parameters by minimizing the error measured by the fitness function, which in this study is the root mean square error.The entire process of training the models is illustrated in Fig. 4, while the pre-set parameters for each model are provided in Table 2.

Model performance metrics
The performance of the developed PM 2.5 forecasting models in this study was evaluated with the help of five parameters: Root mean square error (RMSE), mean absolute error (MAE), Mean absolute percentage error (MAPE), Nash-Sutcliffe (NSE), Willmott Index (d), and a20-index.These metrics are mathematically defined as follows [48][49][50][51] : Next, if RAE j ≤ ∆, then Oj = 1, else, Oj = 0, where ∆ denotes the threshold value.Oj is described as the number of times the value of RAE is less than or equal to ∆.
Additionally, the resilience analysis investigates the models' ability to recover from inaccurate and unsatisfactory predictions.By analyzing the technique's performance under such conditions, resiliency analysis provides insights into its reliability and the extent to which it can adapt and rectify errors.The resiliency of model results is dependent on the reliability of model predictions.If the reliability of the model predictions meets a specific level, the resiliency will reach 100%.However, if the reliability falls below that level, the resiliency in the model results is calculated using the following equation: where T i represents the total cases in which the simulation has the probability of transitioning from an inaccurate prediction to an accurate forecast, indicating the model's ability to recover from initial errors.

Model prediction results and comparison
Table 3 demonstrates the performances of deep learning, hybrid, and other ML models for forecasting PM 2.5 concentrations.Among these, the hybrid model i.e., ELM-SO depicts the best forecasting capabilities, as evidenced by its lowest MAE, RMSE, MAPE, and highest a20, NSE, and WI values (MAE = 20.652µg/m 3 , RMSE = 30.325µg/ m 3 , MAPE = 18.732%, a20 = 0.688, NSE = 0.706 and WI = 0.972).The reason for this may be attributed to the ELM-SO model's ability to integrate the strengths of both SO and ELM.By doing so, the model can effectively tackle the issue of overfitting and prevent itself from being stuck in local optima.The XGBoost model, despite providing notable results, is the least performing model among the five models due to its lowest a20 and NSE values (0.518 and 0.595) and highest MAE and MAPE values (28.439 and 30.792%).
Figure 5a-g illustrates scatter plots featuring isoline regression lines and coefficient of determination (R 2 ) to assess the correlation between observed and predicted PM 2.5 for the testing phase.To ensure a high degree of model accuracy, the predicted and observed values need to be evenly distributed on both sides of the isoline regression line, indicating a Gaussian error distribution.This figure demonstrates that the hybrid model (ELM-SO) has a more favorable distribution of predicted values over the best-fit line and presents the highest R 2 value (R 2 = 0.928) compared to the other models.These observations derived from the scatter plot for the ELM-SO model align consistently with the trends presented in the statistical findings of our models (Table 3).This consistency reinforces the reliability and validity of the relationships and trends identified in our analysis, affirming the robustness of the model performance.
It is worth noting that, despite having satisfactory performances, the other individual models, i.e., ELM, RF, SVR, LSTM, GBR, and XGBoost fail to match the generalization capability of ELM-SO, and their accuracy and reliability get compromised for the testing dataset.The main factor contributing to this variability in the predictions is the non-linear nature of the predicted variable i.e., PM 2.5 concentrations.
Figure 6 showcases the violin plots, which present a superior representation of data distribution, providing a clearer and more effective means of analysis.These plots evaluate the fidelity of the predicted PM 2.5 concerning the observed values and compare the 25%, 50%, and 75% quantile values of both the experimental and predicted PM 2.5 , along with their corresponding interquartile ranges (IQR).From Fig. 6, it is clear that the LSTM (IQR = 117.05µg/m 3 ), SVR (IQR = 117.79µg/m 3 ), and ELM-SO (IQR = 110.37)based predictions correspond very closely to the observed data (IQR = 116.15µg/m 3 ).Furthermore, in the case of median value predictions, the ELM-SO and GBR models showed satisfactory performances with values of 112.17 µg/m 3 and 114.53 µg/m 3 compared to the observed median value of 113.37 µg/m 3 .( 27)

Results of uncertainty, reliability, and resilience analysis
To assess the robustness of model results for the testing dataset, the uncertainty, reliability, and resilience indices are computed for the ELM, RF, SVR, LSTM, and ELM-SO models (Table 3 and Figs.For the estimation of PM 2.5 concentrations, the ELM-SO model showed more reliable forecasting performance with a reliability of 69.04% compared to other models.When compared to ELM (reliability = 64.66%),RF (reliability = 64.66%),SVR (reliability = 62.19%),LSTM (reliability = 60.27%),GBR (reliability = 66.30%) and XGBoost (reliability = 55.34%), the ELM-SO consistently outperformed them and showed better reliability for the testing dataset (Fig. 7a).The higher reliability of ELM-SO confirms the model's accuracy and consistency, not just over time but also across different datasets, with an insignificant margin of error.
Lastly, the resilience analysis also validated the superior performance of the ELM-SO model.According to the results for resilience and other parameters summarized in Fig. 8, it is clearly observed that the PM 2.5 predictions generated by the hybrid ELM-SO model are more resilient (64.60%) in comparison to the individual ELM (46.51%),RF (50.38%),SVR (44.92%),LSTM (42.759%),GBR (59.35%) and XGBoost (44.17%) models respectively.Overall, the findings indicate that the hybrid model (ELM-SO) is a highly effective and accurate technique in estimating PM 2.5 concentrations across a broad range of environmental factors, demonstrating compelling evidence of its efficiency.

Validation of ELM-SO with previous works
In our study, a novel hybrid ELM-SO model was developed to simulate PM 2.5 concentrations.The results in terms of statistical metrics (R 2 , RMSE, MAE, MAPE, a 20 , NSE, and WI) revealed a substantial enhancement in performance compared to ELM, RF, LSTM, SVR, GBR, and XGBoost models.Some researchers developed a novel technique for PM 2.5 forecasting by combining Genetic algorithm and Support vector machine 52 .The models were constructed using land use, meteorological, elevation, traffic, socioeconomic, vegetation, landscape pattern, AOD, and other data as inputs.The results showed that the proposed hybrid modeling approach achieved favorable accuracy (R 2 = 0.840) compared to other individual models.Other scholars presented a novel method by combining spatial-temporal analysis with deep learning approach to estimate PM 2.5 concentrations 53 .The model was constructed by incorporating air quality and meteorological data as inputs to enhance its accuracy and reliability.The results of the proposed hybrid model were evaluated with other modeling techniques such as XGBOOST, BPNN, and RF.The results indicated unsatisfactory performances of the comparable models, whereas the proposed model achieved the highest accuracy with an R 2 value of 0.700.
Furthermore, a published scientific paper reported the development of a novel hybrid system based on Deep learning and Gradient boosting approaches for PM 2.5 prediction 54 .The study incorporated the outdoor imagery dataset as inputs for the model generation.It was concluded that the proposed technique showed good accuracy (R 2 = 0.85) enhancing the accessibility of PM 2.5 concentration estimation for unobserved locations and time periods.Other investigations utilized a hybrid technique based on Random Forest and Kriging to investigate the PM 2.5 prediction and its distribution 55 .The novel approach was developed using air quality, meteorological, land use, and AOD data.The accuracy of the proposed model was evaluated by comparing the PM 2.5 with that generated by the individual RF model.The results demonstrated that the proposed hybrid model with an R 2 = 0.881 outshines the individual RF model in terms of accuracy and performance.In another work, some researchers carried out a PM 2.5 forecasting study using a novel prediction system based on a Convolutional block attention module, Convolutional neural networks, and bi-directional long short-term memory networks 56 .The results of the study revealed that the proposed model was able to accurately predict the PM 2.5 concentrations with an R 2 value of 0.8162.Table 4 summarizes the results of previous studies that developed different prediction models for simulating PM 2.5 concentrations.The results show that the suggested model, ELM-SO, outperformed all other comparable models.

Discussion
The research findings indicated that the combination of SO and ELM in a hybrid model performed significantly better than using a single ELM for predicting PM 2.5 concentration.These results align with other scientific papers published in the literature, which have also shown that hybrid models provide improved prediction performance for simulating air pollution parameters [60][61][62] .Additionally, the SO algorithm enhances the prediction accuracy of the standard ELM by determining optimal weights and bias values.The evaluation statistical criteria and visualization assessments demonstrate that the hybrid model outperforms not only the classical ELM but also other well-known models frequently used in the air quality sector, such as RF, LSTM, XGboost, GBR, and SVR.The SO algorithm effectively discovers global solutions by utilizing nature-inspired behavior, balancing exploration and exploitation, and demonstrating efficient convergence 36 .By mimicking the mating behavior of snakes, SO explores new solution spaces to find innovative and effective solutions.It efficiently converges to satisfactory solutions within a reasonable number of iterations, making it particularly valuable in time-sensitive applications.These characteristics make the suggested model advantageous in predicting PM 2.5 more effectively compared six models developed in this study and other advanced models developed in previous research.
In the prediction of PM 2.5 , the integration of the SO algorithm with ELM offers several technical advantages.First, SO replaces random assignment to guide the optimization of weight and bias parameters in ELM, resulting in enhanced parameter optimization.Second, the algorithm mimics the behavior of snakes in its exploitation and exploration phases, enabling both local fine-tuning and global search capability.Besides, by dynamically adapting parameters such as food quantity, temperature, and mating abilities, the algorithm gains the ability to effectively navigate and explore complex optimization spaces.This adaptability empowers the algorithm to efficiently search for optimal solutions within intricate problem domains, improving its overall performance and robustness in tackling challenging optimization tasks.Overall, the technical aspects and advantages of the SO algorithm demonstrate its potential to effectively optimize ELM's parameters in solving real-world optimization challenges.
Accurate prediction of fine particulate matter (PM 2.5 ) concentrations is crucial in metropolitan areas like Delhi due to its association with chronic health issues and premature mortality.The precise forecasting of PM 2.5 levels serves multiple purposes, including raising public awareness, aiding vulnerable groups in avoiding exposure during periods of high PM 2.5 levels, and providing valuable information for public health alerts.Our adopted models (ELM-SO) have demonstrated improved reliability, as evidenced by lower values of RMSE (30.325 µg/m 3 ), MAE (20.652 µg/m 3 ), MAPE (18.7%), and U95 (9.893).These results highlight the significance of more precise forecasting in mitigating the negative impacts of PM 2.5 on the population in India.Additionally, the suggested system provides early warning information for environmental management and facilitates the creation of efficient strategies to decrease air pollutant emissions.Furthermore, it contributes to advancing research and application in diverse domains, including the study of health concerns associated with PM 2.5 pollution.

Conclusion
This study assessed the predictive capabilities of a novel hybrid model known as ELM-SO, which utilizes the SO, for forecasting PM 2.5 concentrations.The performance of ELM-SO was compared to that of standalone machine learning models including ELM, SVR, RF, LSTM, GBR, and XGBoost.To comprehensively gauge its performance, we conducted a comparative analysis against various standalone machine learning models and other forecasting models developed in previous works.The results demonstrated that the ELM-SO models outperformed other models in predicting PM 2.5 with high accuracy.Furthermore, the ELM-SO model, developed using a novel metaheuristic algorithm (SO) and ELM, performed significantly better than LSTM and other models, achieving a minimum forecasting error of RMSE = 30.325µg/m 3 , which was 15.6% to 20.9% more accurate than other models.The findings highlighted the effectiveness of the SO algorithm in improving the performance of ELM in predicting PM 2.5 concentrations, with an enhancement of 15.6% in forecasting accuracy compared to standard ELM.
Additionally, the uncertainty analysis indicated that the ELM-SO model had a superior capability to present the lowest level of uncertainty compared to the other models.This was evidenced by the fact that it produced the lowest U95 value of 9.89 and had a higher estimated reliability forecasting of approximately 70%.Overall, this study suggests that the ELM-SO hybrid model is a promising approach for accurately predicting PM 2.5 concentrations, which is crucial for mitigating the harmful effects of air pollution on public health and the environment.

Figure 1 .
Figure 1.Air quality monitoring and meteorological stations.

Figure 6 .
Figure 6.Violin plot for the observed and forecasted PM2.5 concentrations for the testing phase.

Figure 7 .
Figure 7. Radial plots depicting the (a) reliability and (b) uncertainty of all the developed models during the testing phase.

Figure 8 .
Figure 8. Bar chart depicting the resilience of all the developed models during the testing phase.

Table 1 .
Characteristics of the dataset used in the study (2015 to 2018).

Table 3 .
Performance metrics for PM 2.5 forecasting models during the testing stage.

Table 4 .
A comparison of PM 2.5 prediction models.