Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration

doi:10.1016/j.atmosenv.2009.11.005

Atmospheric Environment

Volume 44, Issue 4, February 2010, Pages 476-482

https://doi.org/10.1016/j.atmosenv.2009.11.005 Get rights and content

Abstract

This study aims to predict daily carbon monoxide (CO) concentration in the atmosphere of Tehran by means of developed artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) models. Forward selection (FS) and Gamma test (GT) methods are used for selecting input variables and developing hybrid models with ANN and ANFIS. From 12 input candidates, 7 and 9 variables are selected using FS and GT, respectively. Evaluation of developed hybrid models and its comparison with ANN and ANFIS models fed with all input variables shows that both FS and GT techniques reduce not only the output error, but also computational cost due to less inputs. FS–ANN and FS–ANFIS models are selected as the best models considering R², mean absolute error and also developed discrepancy ratio statistics. It is also shown that these two models are superior in predicting pollution episodes. Finally, uncertainty analysis based on Monte-Carlo simulation is carried out for FS–ANN and FS–ANFIS models which shows that FS–ANN model has less uncertainty; i.e. it is the best model which forecasts satisfactorily the trends in daily CO concentration levels.

Introduction

In recent years, artificial intelligence (AI) based methods have been proposed as alternatives to traditional statistical ones in many scientific disciplines. The literature demonstrates that AI models such as ANN and neuro-fuzzy techniques are successfully used for air pollution modeling (Nunnari et al., 2004, Perez-Roaa et al., 2006) and forecasting (Perez et al., 2000, Gautama et al., 2008). Moseholm et al. (1996) investigated the relationships between traffic and carbon monoxide (CO) concentrations using ANN model measured near an intersection which was sheltered from the wind by multi-story buildings. They compared ANN and MLR models and reported ANN as the superior model. Viotti et al. (2002) used an ANN model with a hidden layer to predict short-term and medium-term air pollutant concentrations (CO, ozone and benzene) in an urban area of Perugia city. Martin et al. (2008) used ANN and k-nearest neighbors classifiers as predictive tool in order to predicting future peaks of CO. Noori et al. (2008) compared use of ANN and PCA-MLR models in forecasting CO daily concentration in atmosphere of Tehran and reported ANN as the superior model. Modeling and controlling CO concentration using a neuro-fuzzy technique have been used by Tanaka et al. (1995) in a large city of Japan. Prediction results show that the fuzzy model is much better than the linear model. Yildirim and Bayramoglu (2006) proposed adaptive neuro-fuzzy inference system (ANFIS) to estimate the impact of meteorological factors on SO₂ and total suspended particular matter pollution levels over an urban area in Turkey. Carnevale et al. (2009) presented the application of neural network and neuro-fuzzy models to estimate nonlinear source–receptor relationships between precursor emission and pollutant concentrations (ozone and PM₁₀) in Northern Italy. The results show that, despite a large advantage in terms of computational costs, the selected source–receptor models are able to accurately reproduce the simulation of the 3D modeling system.

Input selection is a crucial step in ANN and ANFIS implementation. These techniques are not engineered to eliminate superfluous inputs. In the case of a high number of input variables, irrelevant, redundant, and noisy variables might be included in the data set, simultaneously; meaningful variables could be hidden (Seasholtz and Kowalski, 1993, Noori et al., 2009a). Therefore, reducing input variables is recommended. There are different methods for reducing the number of input variables such as forward selection (FS) (Chen et al., 1989, Wang et al., 2006) and Gamma test (GT) techniques (Corcoran et al., 2003, Moghaddamnia et al., 2009). In comparison with other statistical models, another important subject which rarely has been observed in ANN and ANFIS is uncertainty analysis of results. It is obvious that predictions are not certain; therefore, uncertainty analysis can be effective in application of results. Literature shows that just a few methods proposed for determination of uncertainty in ANN and ANFIS. Some of them are bootstrap and sandwich estimator (Tibshirani, 1994), maximum likelihood and Bayesian inference (Dybowski, 1997) and Mont-Carlo method proposed by Marce et al. (2004). In this research, Mont-Carlo simulation, which is based on locating the models in a Mont-Carlo random sampling process, is selected, because it has not only better performance but also more novelty. Aqil et al. (2007) applied this uncertainty analysis method for evaluating outputs of ANFIS to predict weekly stream flow in the river and reported that it is appropriate for ANFIS model. Noori et al. (2009b) used Mont-Carlo method for uncertainty analysis of solid waste generation forecasting by means of wavelet transform-ANFIS and wavelet transform-ANN.

In this study, two techniques of input selection (FS and GT) have been applied in order to building hybrid models with ANN and ANFIS (FS–ANN, FS–ANFIS, GT–ANN, and GT–ANFIS), then have been compared with ANN and ANFIS fed with all input data. Finally, uncertainty analysis is done for two best models and the superior model is reported.

Section snippets

Case study and data

Tehran is the capital and the largest city of Iran which is located between 35° 34′–35° 50′N and 51° 02′–51° 36′E with the area about 570 km². It is surrounded by mountains to the north, west and east. It has current population of about 8,000,000 (Bayat, 2005). There are 11 air quality measurement stations in Tehran. The results of previous studies about air pollution of Tehran demonstrate that 90% by weight of total air pollutants are generated from traffic and only 10% from other sources (

Forward selection

In this study, the FS method is used as a linear input selection technique in order to select the best subset of 12 input candidates. In other words, a linear model is developed using best correlated subset of inputs. First, correlation between each input variable and the desired output is evaluated. Second, the variable with highest correlation, i.e. Temp with R² = 0.26, is selected as the first and the most important input. Then, remained candidates are implemented into the model one by one

Conclusion

Considering the importance of daily CO concentration in the atmosphere of Tehran, this research aims to develop proper prediction models using ANN and ANFIS models. Since input selection is a significant step in modeling, FS and GT methods are used and six models are developed. The goodness of each model is evaluated using R², d, and MAE statistics and also, DDR. Finally, uncertainty analysis of FS–ANN and FS–ANFIS, as superior models, is carried out. The following conclusions could be drawn

References (53)

K.C. Abbaspour et al.
Modeling hydrology and water quality in the pre-alpine/alpine Thur watershed using SWAT
Journal of Hydrology
(2007)
M. Aqil et al.
Analysis and prediction of flow from local source in a river basin using a neuro-fuzzy modeling tool
Journal of Environmental Management
(2007)
C. Carnevale et al.
Neuro-fuzzy and neural network systems for air quality control
Atmospheric Environment
(2009)
J. Corcoran et al.
Predicting the geo-temporal variation of crime and disorder
International Journal of Forecasting
(2003)
P. Coulibaly et al.
Daily reservoir inflow forecasting using artificial neural networks with stopped training approach
Journal of Hydrology
(2000)
B. Eksioglu et al.
Subset selection in multiple linear regression: a new mathematical programming approach
Computers & Industrial Engineering
(2005)
J.A. Khan et al.
Building a robust linear model with forward selection and stepwise procedures
Computational Statistics & Data Analysis
(2007)
M.L. Martin et al.
Prediction of CO maximum ground level concentrations in the Bay of Algeciras, Spain using artificial neural networks
Chemosphere
(2008)
A. Moghaddamnia et al.
Evaporation estimation using artificial neural networks and adaptive neuro-fuzzy inference system techniques
Advances in Water Resources
(2009)
L. Moseholm et al.
Forecasting carbon monoxide concentrations near a sheltered intersection using video traffic surveillance and neural networks
Transport Research
(1996)

S.M.S. Nagendra et al.

Artificial neural network approach for modeling nitrogen dioxide dispersion from vehicular exhaust emissions

Ecological Modelling

(2006)

R. Noori et al.

Results uncertainty of solid waste generation forecasting by hybrid of wavelet transform-ANFIS and wavelet transform-neural network

Expert Systems with Applications

(2009)

G. Nunnari et al.

Modelling SO₂ concentration at a point with statistical approaches

Environmental Modelling & Software

(2004)

P. Perez et al.

Prediction of PM_2.5 concentrations several hours in advance using neural networks in Santiago, Chile

Atmospheric Environment

(2000)

M.B. Seasholtz et al.

The parsimony principle applied to multivariate calibration

Analytica Chimica Acta

(1993)

P. Viotti et al.

Atmospheric urban pollution: applications of an artificial neural network (ANN) to the city of Perugia

Ecological Modelling

(2002)

X.X. Wang et al.

Sparse support vector regression based on orthogonal forward selection for the generalised kernel model

Neurocomputing

(2006)

Y. Yildirim et al.

Adaptive neuro-fuzzy based modelling for prediction of air pollution daily levels in city of Zonguldak

Chemosphere

(2006)

S. Agalbjörn et al.

A note on the gamma test

Neural Computing Applied

(1997)

Bayat, R., 2005. Source Apportionment of Tehran's Air Pollution. M.Sc thesis. Department of Civil and Environmental...

S. Chen et al.

Orthogonal least squares methods and their application to nonlinear system identification

International Journal of Control

(1989)

S. Chen et al.

Sparse modeling using orthogonal forward regression with PRESS statistic and regularization

IEEE Transactions on Systems, Man, and Cybernetics – Part B

(2004)

S.L. Chiu

Fuzzy model identification based on cluster estimation

Journal of Intelligent Information Systems

(1994)

G. Cybenko

Approximation by superposition of a sigmoidal function

Mathematics of Control, Signals, and Systems

(1989)

Z.Q. Deng et al.

Longitudinal dispersion coefficient in straight rivers

Journal of Hydraulic Engineering (ASCE)

(2001)

Durrant, P.J., 2001. winGamma: a Non-linear Data Analysis and Modeling Tool with Applications to Flood Prediction. PhD...

Cited by (154)

Explainable based approach for the air quality classification on the granular computing rule extraction technique
2024, Engineering Applications of Artificial Intelligence
Air pollution corresponds to one of the considerable challenges and disastrous sides of the environment that causes severe damage to all its biodiversity, including humans. As a result, establishing efficient, reliable, and interpretable methods and techniques to predict and control air quality is a must to preserve the environment and consider the necessary precautions. Most traditional machine learning models often lack transparency, making it challenging to interpret their decisions, especially in vital domains like air pollution. This paper proposes a novel approach that leverages granular computing to extract interpretable rules for air quality classification. We demonstrate the effectiveness of our approach through experiments on a real-world air quality dataset, showcasing the interpretability of the extracted rules and their accuracy in classifying air quality levels. The output of the proposed GrC model is a tree-like structure minimizing the entropy, allowing an easier interpretation of the classification results. A comparison is conducted with some widely used machine learning algorithms, including decision tree classifier, random forest, and CatBoost. The results indicate that the proposed granular computing rule extraction approach shows a competitive performance according to traditional black-box models in terms of accuracy (79%), transparency and reliability. The developed GrC model and the findings of this study not only contribute to advancing the field of air quality classification but also bear broader implications for environmental research and management for relevant and informed decision-making.
Toxicity source apportionment of fugitive dust PM<inf>2.5</inf>-bound polycyclic aromatic hydrocarbons using multilayer perceptron neural network analysis in Guanzhong Plain urban agglomeration, China
2024, Journal of Hazardous Materials
Polycyclic aromatic hydrocarbons (PAHs) in urban fugitive dust, known for their toxicity and ability to generate reactive oxygen species (ROS), are a major public health concern. This study assessed the spatial distribution and health risks of 15 PAHs in construction dust (CD) and road dust (RD) samples collected from June to November 2021 over the cities of Tongchuan (TC), Baoji (BJ), Xianyang (XY), and Xi'an (XA) in the Guanzhong Plain, China. The average concentration of ΣPAHs in RD was 39.5 ± 20.0 μg g⁻¹, approximately twice as much as in CD. Four-ring PAHs from fossil fuels combustion accounted for the highest proportion of ΣPAHs in fugitive dust over all four cities. Health-related indicators including benzo(a)pyrene toxic equivalency factors (BAP_TEQ), oxidative potential (OP), and incremental lifetime cancer risk (ILCR) all presented higher risk in RD than those in CD. The multilayer perceptron neural network algorithm quantified that vehicular and industrial emissions contributed 86 % and 61 % to RD and CD BAP_TEQ, respectively. For OP, the sources of biomass and coal combustion were the key generator which accounted for 31–54 %. These findings provide scientific evidence for the direct efforts toward decreasing the health risks of fugitive dust in Guanzhong Plain urban agglomeration, China.
Three-hourly PM<inf>2.5</inf> and O<inf>3</inf> concentrations prediction based on time series decomposition and LSTM model with attention mechanism
2023, Atmospheric Pollution Research
In recent years, air quality has attracted wide attention from all over the world, among which the high concentration of particulate matter with an aerodynamic less than 2.5 μm (PM_2.5) and ozone (O₃) has a great adverse impact on human health and daily life. Previous studies on two pollutant predictions are overly concerned with model improvement but hardly focus on influence variable screening, and pollutant's time series features extraction and identification. To better improve the prediction accuracy and enhance the application of the model in practice, in the present study, a novel model RF-CEEMDAN-Attention-LSTM was proposed, which has three processes: (1) random forest (RF) screened out highly correlated influence variables; (2) complete ensemble empirical mode decomposition with adaptive noise (CEEMADN) method was adopted to decompose the PM_2.5 and O₃ concentration time series into multiple sub-time sequences; (3) the double hidden layer LSTM model with attention and dropout mechanism captured nonlinear relationships and dynamic changes of time series features. Three-hourly PM_2.5 and O₃ concentration in Chengdu were used to validate the effectiveness of the developed RF-CEEMDAN-Attention-LSTM model by comparing five other parallel models. The final results showed that the model not only had a better fitting effect on both PM_2.5 and O₃ than other comparable models in the entire timeline, but the model also had the highest R² (PM_2.5:0.916, O₃:0.525) values for these two air pollutants at high concentration values.
Environmental modelling of CO concentration using AI-based approach supported with filters feature extraction: A direct and inverse chemometrics-based simulation
2023, Sustainable Chemistry for the Environment
This study explored the first direct and inverse modelling of Carbon monoxide (CO) concentration by applying three different computational approaches namely, Least Square-boost (L-Boost), Hammerstein Weiner (HW), and Multi-variate regression (MVR) models for modelling CO using Sulphur dioxide (SO₂), Nitrogen dioxide (NO₂) and Ozone (O₃). Two filters feature extraction methods were used in the input-combinations selection, which was classified into the direct modelling approach (C1) and inverse modelling approach (C2). Four different statistical metrics, including Nash-Sutcliffe efficiency, Person correlation coefficient, mean absolute error and mean absolute percentage error was used in evaluating the performance of the models. Based on the performance skills of the models, it is indicated that the direct modelling approach showed superior performance to the inverse approach. Furthermore, the non-linear models (HW and L-Boost) depict higher performance skills than the linear MVR model. Overall, the L-Boost-C1 models showed higher performance based on statistical metrics.
Operation characteristics and performance prediction of a 3 kW organic Rankine cycle (ORC) with automatic control system based on machine learning methodology
2023, Energy
Automatic control system enables the laboratory organic Rankine cycle (ORC) to adapt to variable operating conditions of industrial application. In this study, the operation characteristics of a 3 kW ORC with automatic control system applied to a chemical plant, as well as the performance prediction and optimization using machine learning methodology, are addressed. The dynamic behaviors for startup, operating and stop stages are discussed. The BP-ORC neural network model is established based on 3400 sets of experimental data, while the prediction accuracy is analyzed based on the errors of the training and test samples. The effects of six operation parameters on system performance are examined, while the bi-objective optimization for maximum thermal efficiency and maximum net output work is investigated. Results indicate that the component response times for startup stage and stop stage are 90s and 300s, respectively. Increasing the mass flow rate, decreasing the expander outlet temperature and increasing the expander inlet temperature ensure a higher net output work, while increasing the expander inlet temperature, decreasing the expander outlet temperature and increasing pump outlet pressure enable a higher thermal efficiency. The optimum net output work and thermal efficiency from Pareto-optimal solution are 2.87 kW and 8.855%, respectively.
ANFIS and ANN models to predict heliostat tracking errors
2023, Heliyon
The efficiency and performance of solar tower power are greatly influenced by the heliostats field. To ensure accurate tracking of reflectors often requires an evaluation of the beam reflected positions. This operation is costly time-consuming due to the number of heliostats. It is also necessary to set up a fast and less expensive method able to evaluate tracking heliostat. In this paper, prediction models based on the Adaptive Neuro-Fuzzy Inference System (ANFIS) and Artificial Neural Network (ANN) were applied to estimate rapidly and accurately heliostat error tracking. The modeling is based on the experimental data of seven different days. The input parameters are time and day number and the output is the beam reflected position following the altitude and azimuth axes. Both techniques have been able to predict the beam reflected position. A comparison of results showed that intelligent methods recorded better performance than conventional model based on geometric errors. For ANFIS model, coefficients of correlation (R²) of 0.97 is obtained compared to that of the ANN, 0.96 and 0.92 for altitude and azimuth axes respectively. The intelligent methods may be a promising alternative for predicting heliostat beam reflected the position.

View all citing articles on Scopus

View full text

Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration

Abstract

Introduction

Section snippets

Case study and data

Forward selection

Conclusion

Journal of Hydrology

Journal of Environmental Management

Atmospheric Environment

International Journal of Forecasting

Journal of Hydrology

Computers & Industrial Engineering

Computational Statistics & Data Analysis

Chemosphere

Advances in Water Resources

Transport Research

Ecological Modelling

Expert Systems with Applications

Environmental Modelling & Software

Atmospheric Environment

Analytica Chimica Acta

Ecological Modelling

Neurocomputing

Chemosphere

A note on the gamma test

Neural Computing Applied

Orthogonal least squares methods and their application to nonlinear system identification

International Journal of Control

Sparse modeling using orthogonal forward regression with PRESS statistic and regularization

IEEE Transactions on Systems, Man, and Cybernetics – Part B

Fuzzy model identification based on cluster estimation

Journal of Intelligent Information Systems

Approximation by superposition of a sigmoidal function

Mathematics of Control, Signals, and Systems

Longitudinal dispersion coefficient in straight rivers

Journal of Hydraulic Engineering (ASCE)