Application of cellular neural network (CNN) to the prediction of missing air pollutant data
Research Highlights
► We have applied a Cellular Neural Network (CNN) approach to the air pollution. ► The missing concentrations of PM10 and SO2 pollutants modeled using this method. ► This paper is first example in practically about using CNN to air pollution. ► These result shows that the CNN modeling technique can be considered a promising approach for air pollutant prediction.
Introduction
The main sources of air pollution in Istanbul are the combustion of poor quality coal, increased traffic load and industrial activities. In the last two decades, many scientists have focused on the air pollution problems of Istanbul-Turkey (Erturk, 1986, Tayanç, 2000, Saral and Ertürk, 2003, Sahin, 2005, Im et al., 2008, Hanedar et al., 2011). During the winter, sulfur dioxide (SO2) and particulate matter (PM) are the major air pollutants affecting regional air quality. Missing data, which may be due to insufficient sampling and errors in measurements or problems with data acquisition, presents a problem that is frequently encountered in environmental research. Regardless of the reasons for missing data, discontinuities in data pose a significant obstacle to time-series prediction schemes, which generally require continuous data as a condition for their implementation.
The substitution of mean values for missing data is commonly suggested, and is still used in many statistical software packages (Junninen et al., 2004). A slightly better approach is to impute the missing elements from an ANOVA model or similar statistical method. Another approach to the problem is to use a simplistic interpolation method, such as assuming the season's average concentration at the time of day for which data are missing, or to linearly interpolate between values of the previous and following to obtain continuous data sets. Neither of these methods is ideal, because the meteorology on the missing day may have been significantly different from the days on which the interpolation is based, leading to unrealistic predictions (Dirks et al., 2002). Clearly, a complementary method is required.
There are many deterministic and stochastic approaches to modeling the concentrations of air pollutants. The well-known machine-learning approach is Artificial Neural Networks (ANN). That is concerned with the design and development of algorithms that allow computers to empirically learn the behavior of data sets. Machine learning approaches have been used and applied to the correction of bias for various environmental problems and weather prediction since 1990. Neural networks are suitable for the application of these areas due to their ability to model non-linear mechanism. A recent paper by Manzato, 2007, Fernandez-Ferrero et al., 2009 studied different statistical downscaling methods applied to different numerical weather forecasting. These paper results have shown the ANNs proved to be a powerful statistical method, but special care must be used to prevent over fitting.
In many studies, ANNs are applied to predict SO2 and PM10 concentrations (Boznar et al., 1993, Mok and Tam, 1998, Saral and Ertürk, 2003, Chelani et al., 2002, Onat et al., 2004, Sahin et al., 2005, Yildirim and Bayramoğlu, 2006). Gardner and Dorling (1998) have published a comprehensive review of studies using an ANN approach for environmental air pollution modeling. Kukkonen et al. (2003) have studied five neural network (NN) models, a linear statistical model and a deterministic modeling system for the prediction of urban NO2 and PM10 concentrations. Sahin et al. (2004) used a multi-layer neural network model to predict daily CO concentrations, using meteorological variables, in the European side of Istanbul, Turkey. Kurt et al. (2008) also developed an online air pollution forecasting system in Istanbul using NN. Another NN model developed by Saral and Ertürk (2003) was also used to predict regional SO2 concentrations. Junninen et al. (2004) applied regression-based imputation, nearest neighbor interpolation, a self organizing map, a multi-layer perceptron model and hybrid methods to simulate missing air quality data. Nagendra and Khare (2006) studied the usefulness of NNs in understanding the relationship between traffic parameters and NO2 concentrations. Recently, several researchers used NN techniques to predict airborne PM concentrations: e.g. Ordieres et al., 2005, Hooyberghs et al., 2005, Perez and Reyes, 2006, Slini et al., 2006. These days, some scientist use machine learning approaches to modeling the satellite data (Lary et al., 2009, Gupta and Christopher, 2009). All of these studies reported that ANN could be used to develop efficient air-quality analysis and forward-looking prediction models. But in ANNs, the training process becomes increasingly complex and requires longer time durations as the number of weighting coefficients of the ANN rise into the millions due to the complexity of the environmental study.
To reduce the number of weighting coefficients, Chua and Yang (1988) introduced another machine learning approach, Cellular Neural Network (CNN) in 1988. Because each cell of the CNN is represented by a separate analog processor, and because each cell is locally interconnected to its neighbors by matrix A and gets a feedback from them by matrix B, this configuration results in a very high-speed tool for parallel dynamic processing of 2-D structures (Cimagalli, 1993, Guzelis and Karamahmut, 1994, Ucan et al., 2001, Grassi and Grieco, 2002). CNN approaches have been applied to air pollution modeling by a number of researchers, with excellent results (Sahin, 2005, Ozcan et al., 2007, Thai and Cat, 2008).
In this study, we have applied a CNN approach to the problem of predicting the daily mean missing concentrations of PM10 and SO2 pollutants in the Yenibosna and Umraniye-Istanbul regions of Turkey. This paper is organized as follows: In 2.1 Architecture of CNN, 2.2 Multiple linear regression model the Cellular Neural Network (CNN) and Multiple Linear Regression (LR) modeling techniques are defined. In order to evaluate model prediction, statistical performance indices are explained in Section 2.3. The study area and database are explained in Section 2.4. Model construction is described in Section 2.5. In Section 3.1, PM10 and SO2 pollution in Istanbul is explained and in Section 3.2, the CNN is tested on real data and the results are presented and compared to LR technique. In Section 4, the results of the study are evaluated.
Section snippets
Architecture of CNN
Most neural networks fall into two main classes: (1) memoryless neural networks and (2) dynamical neural networks. As in Hopfield Networks and CNNs, dynamical neural networks are usually designed as dynamic systems in which the inputs are set to constant values and the path approach to a stable equilibrium point depends upon the initial state. A CNN is composed of large-scale nonlinear analog circuits which process signals in real time (Chua and Yang, 1988). The basic unit of a CNN is called a
PM10 and SO2 pollution in Istanbul
Summary statistics of daily PM10 and SO2 data between 1999 and 2003 at the Yenibosna and Umraniye stations are given in Table 4. The daily PM10 and SO2 concentrations for each station are given in Fig. 6. The PM10 and SO2 concentrations recorded at the Yenibosna station were higher than those at the Umraniye station. In Yenibosna, traffic, industry and residential populations are quite dense. The five-year average SO2 concentration measured at the Yenibosna station was one and a half times
Conclusion
In this study, the major air pollutants of concern for the city of Istanbul, particulate matter (PM) and sulfur dioxide (SO2), were estimated using a CNN approach. There are many computational methods available for air pollutant modeling. One of the frequently used methods is the use of an Artificial Neural Network (ANN). In ANN modeling, the training process time increases as the problem becomes increasingly complex. To reduce the complexity of the calculations used by the ANN, Chua and Yang
Acknowledgments
We are grateful to the Istanbul Municipality, Environmental Protection Directorate and the Department of Meteorology in Istanbul for their help in obtaining actual data. This work was supported by the Research Fund of the University of Istanbul. Project Number: T-486/25062004.
References (43)
- et al.
Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Balbao area
Environmental Modeling & Software
(2006) - et al.
Separation of Bouguer anomaly map using cellular neural network
Journal of Applied Geophysics
(2001) - et al.
A neural network based method for short-term predictions of ambient SO2 concentrations in highly polluted industrial areas of complex terrain
Atmospheric Environment
(1993) - et al.
A simple semi-empirical model for predicting missing carbon monoxide concentrations
Atmospheric Environment
(2002) Investigation of strategies for the control of air pollution in the Golden Horn Region, Istanbul, using a simple dispersion model
Environmental Pollution B
(1986)- et al.
Evaluation of statistical downscaling in short range precipitation forecasting
Atmospheric Research
(2009) - et al.
Artificial neural networks (the multilayer perceptron) — a review of applications in the atmospheric sciences
Atmospheric Environment
(1998) - et al.
Artificial neural network models for prediction of PM10 hourly concentrations, in the Greater Area of Athens, Greece
Atmospheric Environment
(2006) - et al.
Application of cellular neural network (CNN) method to the nuclear reactor dynamics equations
Annals of Nuclear Energy
(2007) - et al.
Concentrations and sources of PAHs at three stations in İstanbul, Turkey
Atmospheric Research
(2011)
A neural network forecast for daily average PM10 concentrations in Belgium
Atmospheric Environment
Interaction patterns of major photochemical pollutants in Istanbul, Turkey
Atmospheric Research
Methods for imputation of missing values in air quality data sets
Atmospheric Environment
Long-range potential source contributions of episodic aerosol events to PM10 profile of a megacity
Atmospheric Environment
Extensive evaluation of neural network models for the prediction of NO2 and PM10 concentrations, compared with a deterministic modeling system and measurements in central Helsinki
Atmospheric Environment
An online air pollution forecasting system using neural networks
Environmental International
Short-term prediction of SO2 concentration in Macau with artificial neural networks
Energy and Buildings
Artificial neural network approaches for modelling nitrogen dioxide dispersion from vehicular exhaust emissions
Ecological Modeling
Modelling SO2 concentration at a point with statistical approaches
Environmental Modeling & Software
Neural network prediction model for fine particulate matter (PM2.5) on the US–Mexico border in El Paso (Texas) and Ciudad Juarez (Chihuahua)
Environmental Modeling & Software
An integrated neural network model for PM10 forecasting
Atmospheric Environment
Cited by (41)
Deep learning for air pollutant concentration prediction: A review
2022, Atmospheric EnvironmentCitation Excerpt :The units in the convolution layer are organized in the feature map, and each unit is connected to the local weights in the feature map of the previous layer through filters. The sum of the local weights is passed through an activation function that can take various forms, such as a Rectified Linear Units (ReLU) (Şahin et al., 2011). Some researchers integrated spatial data between different regions to a one-dimensional or two-dimensional tensor, thus the CNN was facilitated to extract the spatial correlation hidden in the tensor (Yan et al., 2021; Mengara et al., 2020).
A transfer Learning-Based LSTM strategy for imputing Large-Scale consecutive missing data and its application in a water quality prediction system
2021, Journal of HydrologyCitation Excerpt :The main advantages of R-ELM are twofold: first, the time complexity is very low, and second, there is no need to tune the related parameters in advance. However, standard machine learning and deep learning methods require large amounts of labelled data for training (Şahin et al., 2011), so they are not well suited for imputing missing water quality data; notably, in such datasets, gaps in labelled data are generally consecutive and can be extensive. The proposed algorithm encompasses the advantages of the transfer learning technique and the LSTM model, and it respectively aims to fill large-scale consecutive gaps in missing data while capturing excellent long-dependent trends from observations of similar time series.
Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series
2020, Advanced Engineering InformaticsCitation Excerpt :Then they analyzed value distribution patterns in sub-groups and re-filled missing data within sub-groups. Şahin et al. [23] developed a neural network named Cellular Neural Network (CNN) to predict missing air pollution data. Unfortunately, although various approaches have been proposed for missing data imputation, most of them were applied in the domain of clinical disease, computer science and economics [19–22,24].
A data enhancement-based quadratic imputation framework for consecutive missing values considering spatiotemporal characteristics of dam deformation
2024, Journal of Civil Structural Health MonitoringUse of long short-term memory network (LSTM) in the reconstruction of missing water level data in the River Seine
2023, Hydrological Sciences Journal