Research articleAssessing the biochemical oxygen demand using neural networks and ensemble tree approaches in South Korea
Introduction
Water quality in rivers, lakes, reservoirs, and oceans can be expressed through biological, physical, and chemical features of water (Khalil et al., 2011; Ahmed and Shah, 2017). The evaluation of water quality variables (e.g., dissolved oxygen (DO), biochemical oxygen demand (BOD), chemical oxygen demand (COD), nitrogen (N), phosphorus (P), total nitrogen (T-N), total phosphorus (T-P), and total organic carbon (TOC) etc.) is critical for operation and management of water resources systems (Najah et al., 2009; Khaled et al., 2017).
BOD was chosen as a sign of river pollution by the U.K. Royal Commission on River Pollution in 1908. The five-day period at a specified temperature (usually 20 °C) to estimate BOD5 was determined (Great Britain, 1908). The American Public Health Association Standard Methods Committee (APHASMC) identified BOD as a reference to assess the organic pollution of water in 1936. BOD can be expressed as the quantity of DO required by aerobic biological organisms to reduce the organic substance in water at specific temperature (Raheli et al., 2017; Solgi et al., 2017; Ahmadi et al., 2018; Tao et al., 2019). Most of the water quality variables, such as DO, COD, N, P, T-N, T-P, and TOC, can be measured by utilizing water quality equipment/kit directly at field scale. However, BOD can be estimated indirectly beyond field scale by utilizing the oxygen amount (milligrams) consumed per liter of sample during 5 days of incubation at 20 °C. Therefore, it is a difficult and major parameter to evaluate water quality in rivers, and is one of the measures for the maintenance of river ecosystems (Ay and Kisi, 2011). Accurate estimation of water quality variables can provide fundamental data to operate and manage the water quality systems (Xu and Liu, 2013). Traditional methods of prediction of water quality are iterative, requiring for a lot of time and effort (Zou et al., 2007).
To overcome the shortcomings of traditional methods for predicting water quality variables, machine learning approaches have been explored and reported in various articles during the past two decades (Maier and Dandy, 1996; Zhang and Stanley, 1997; Zhang et al., 2002; Diamantopoulou et al., 2005). Diamantopoulou et al. (2007) proposed the cascade correlation neural networks (CCNN) model to estimate the missing water quality variables. Zhao et al. (2007) applied the backpropagation neural networks (BPNN) model to predict the water quality indicators in reservoir. Dogan et al. (2009) explored the feedforward neural networks (FFNN) model in predicting BOD in the Melen River, Turkey. Utilizing the artificial neural networks (ANN) model, Najah et al. (2009) predicted the electrical conductivity (EC), total dissolved solids (TDS), and turbidity in the Johor River, Malaysia.
Han et al. (2011) utilized the radial basis function neural networks (RBFNN) model to predict the water quality variable in wastewater treatment. By utilizing the FFNN model, Gazzaz et al. (2012) predicted the water quality indices in the Kinta River, Malaysia. Li et al. (2019b) developed the support vector regression (SVR) model combined with firefly algorithm (FFA) for predicting water quality indicator in the Euphrates River, Iraq. Zounemat-Kermani et al. (2019) applied multilayer perceptron (MLP) and CCNN models to predict DO in St. Johns River, Florida.
Among different machine learning approaches, some specific methods have been utilized to predict BOD (Dogan et al., 2009; Fathima et al., 2014; Noori et al., 2015; Khaled et al., 2017; Raheli et al., 2017). Emamgholizadeh et al. (2014) employed the MLP and adaptive neuro-fuzzy inference system (ANFIS) models to predict BOD, COD, and DO in the Karoon River, Iran. Ahmed and Shah (2017) applied the ANFIS model for predicting BOD in the Surma River, Bangladesh. Solgi et al. (2017) utilized the SVR and ANFIS models combined with wavelet transform (WT) and principal component analysis (PCA) to predict BOD in the Karun River, Iran. Also, Tao et al. (2019) explored the hybrid response surface method (HRSM) and SVR model for predicting BOD and DO in the Euphrates River, Iraq. Granata et al. (2017) suggested the SVR and regression trees (RT) models to predict BOD, COD, TDS, and total suspended solids (TSS) in USA.
Although several machine learning approaches have been used to predict BOD, but these have their advantages/disadvantages and assumptions, new concepts are still needed to improve prediction accuracy. A new modeling strategy, deep echo state network (Deep ESN) model, which involves the capabilities of deep learning and an echo state network (ESN), has an ability of modeling highly complex and nonlinear relationships (Gallicchio and Micheli, 2017). These features of the Deep ESN model provide an impetus to investigate its ability for predicting BOD. To the best of our knowledge, this model has not been previously implemented for this issue.
This article explores the ability of Deep ESN model, for the prediction of BOD in the Han River, South Korea. The performance of the suggested model is assessed and compared with that of the ELM, GBRT, and RF models utilizing four statistical criteria (i.e., RMSE, NSE, R2, and R) and graphical comparisons (i.e., scatter diagram, error histogram, and Taylor diagram). This article is categorized as follows: The 2nd section expresses the applied approaches including the Deep ESN, ELM, GBRT, and RF models, respectively. Study area and information about data is provided in the 3rd section, and the 4th section describes application and results. Conclusions are finally summarized in the 5th section of article.
Section snippets
Materials and methods
Machine learning (ML) approaches applied in addressed research fall into two main categories of (i) neural networks approaches, including the Deep ESN and ELM models, and (ii) ensemble tree approaches, including GBRT and RF models. Following section described the applied models.
Study area and data
In current research, two stations (i.e., Gongreung and Gyeongan) were selected to predict BOD in Han River basin, South Korea. Gongreung station is located at Gongreung Stream with latitude 37°67′N and longitude 126°89′E, and Gyeongan station is located on Gyeongan Stream with a latitude 37°44′N and a longitude 127°31′E. The changes of BOD values were investigated using different water quality variables, including the potential of Hydrogen (pH), electrical conductivity (EC), dissolved oxygen
Application and results
This study used different water quality variables (i.e., BOD, pH, EC, DO, WT, COD, SS, T-N, and T-P) to predict BOD at Gongreung and Gyeongan stations in South Korea. As explained previously, evaluation for the performances of Deep ESN, ELM, GBRT, and RF models in predicting BOD is the key point of this article.
The water quality variables (i.e., pH, EC, DO, WT, COD, SS, T-N, and T-P) can be directly measured using small equipment at field scale. BOD, however, cannot be observed directly but
Conclusions
This research investigated the accuracy and reliability of novel machine learning model, Deep ESN, for predicting the biochemical oxygen demand. The outcomes of this approach were compared to the ELM, GBRT, and RF models using data of BOD and eight water quality variables (i.e., pH, EC, DO, WT, COD, SS, T-N, and T-P) collected from Gongreung and Gyeongan stations, South Korea. For the training and testing phases of these models, the data were separated into 80% (training) and 20% (testing),
CRediT authorship contribution statement
Sungwon Kim: Conceptualization, Data curation, Writing - original draft, Investigation, Validation. Meysam Alizamir: Methodology, Software, Visualization, Formal analysis. Mohammad Zounemat-Kermani: Methodology, Software, Resources. Ozgur Kisi: Writing - review & editing, Investigation. Vijay P. Singh: Writing - review & editing, Supervision.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (59)
- et al.
Application of adaptive neuro-fuzzy inference system (ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River
J. King Saud Univ., Eng. Sci.
(2017) - et al.
A new hybrid artificial neural networks for rainfall–runoff process modeling
Neurocomputing
(2013) - et al.
HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts
Environ. Model. Software
(2007) - et al.
Universally deployable extreme learning machines integrated with remotely sensed MODIS satellite predictors over Australia to forecast global solar radiation: a new approach
Renew. Sustain. Energy Rev.
(2019) - et al.
Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique
J. Environ. Manag.
(2009) - et al.
Architectural and markovian factors of echo state networks
Neural Network.
(2011) - et al.
A criterion of efficiency for rainfall-runoff models
J. Hydrol.
(1978) - et al.
Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors
Mar. Pollut. Bull.
(2012) - et al.
Random forests for land cover classification
Pattern Recogn. Lett.
(2006) - et al.
An efficient self-organizing RBF neural network for water quality prediction
Neural Network.
(2011)
Extreme learning machine: theory and applications
Neurocomputing
Methods for assessing biochemical oxygen demand (BOD): a review
Water Res.
Estimation of water quality characteristics at ungauged sites using artificial neural networks and canonical correlation analysis
J. Hydrol.
Data-Mining for processes in chemistry, materials, and engineering
Processes
Uncertainty in the spatial prediction of soil texture: comparison of regression tree and random forest models
Geoderma
River flow forecasting through conceptual models, Part 1 – a discussion of principles
J. Hydrol.
Uncertainty analysis of support vector machine for online prediction of five-day biochemical oxygen demand
J. Hydrol.
Multi-site solar power forecasting using gradient boosted regression trees
Sol. Energy
Daily river flow forecasting using ensemble empirical mode decomposition based heuristic regression models: application on the Perennial rivers in Iran and South Korea
J. Hydrol.
Daily water level forecasting using wavelet decomposition and artificial intelligence techniques
J. Hydrol.
Improving SVR and ANFIS performance using wavelet transform and PCA algorithm for modeling and predicting biochemical oxygen demand (BOD)
Ecohydrol. Hydrobiol.
Development of an accurate and reliable hourly flood forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach
J. Hydrol.
Estimating 2-year flood flows using the generalized structure of the Group Method of Data Handling
J. Hydrol.
Study of short-term water quality prediction model based on wavelet neural network
Math. Comput. Model.
Predicting compressive strength of lightweight foamed concrete using extreme learning machine model
Adv. Eng. Software
Forecasting raw-water quality parameters for the North Saskatchewan River by neural network modeling
Water Res.
Application of an empirical neural network to surface water quality estimation in the Gulf of Finland using combined optical data and microwave data
Remote Sens. Environ.
Assessment of input data selection methods for BOD simulation using data-driven models: a case study
Environ. Monit. Assess.
Modeling of dissolved oxygen concentration using different neural network techniques in Foundation Creek, El Paso County, Colorado
J. Environ. Eng.
Cited by (37)
Machine learning approach for microbial growth kinetics analysis of acetic acid-producing bacteria isolated from organic waste
2024, Biochemical Engineering JournalQuantification of river total phosphorus using integrative artificial intelligence models
2023, Ecological IndicatorsInternet of Things in aquaculture: A review of the challenges and potential solutions based on current and future trends
2023, Smart Agricultural TechnologyPredicting quality parameters of wastewater treatment plants using artificial intelligence techniques
2023, Journal of Cleaner ProductionApplications of deep learning in water quality management: A state-of-the-art review
2022, Journal of HydrologyCitation Excerpt :The ESN model is an RNN with a hidden layer defining the state dynamics of the network (Jaeger, 2007). Kim et al. (2020b) designed a deep ESN model that is comprised of a hierarchy of stacked recurrent layers for which the output of each layer will be fed into the next layer. The proposed deep ESN model has demonstrated lower prediction errors than the extreme learning machine (ELM), random forest (RF), and gradient boosted regression trees (GBRT) in the prediction of BOD.