Research article
Assessing the biochemical oxygen demand using neural networks and ensemble tree approaches in South Korea

https://doi.org/10.1016/j.jenvman.2020.110834Get rights and content

Highlights

  • Machine learning models are developed for predicting the biochemical oxygen demand.

  • Water quality variables are used for model development based on input categories 1–5.

  • Four statistical indices are utilized for evaluating and assessing the developing models.

  • Deep echo state network 5 model provides the most accurate results to predict BOD.

Abstract

The biochemical oxygen demand (BOD), one of widely utilized variables for water quality assessment, is metric for the ecological division in rivers. Since the traditional approach to predict BOD is time-consuming and inaccurate due to inconstancies in microbial multiplicity, alternative methods have been recommended for more accurate prediction of BOD. This study investigated the capability of a novel deep learning-based model, Deep Echo State Network (Deep ESN), for predicting BOD, based on various water quality variables, at Gongreung and Gyeongan stations, South Korea. The model was compared with the Extreme Learning Machine (ELM) and two ensemble tree models comprising the Gradient Boosting Regression Tree (GBRT) and Random Forests (RF).

Diverse water quality variables (i.e., BOD, potential of Hydrogen (pH), electrical conductivity (EC), dissolved oxygen (DO), water temperature (WT), chemical oxygen demand (COD), suspended solids (SS), total nitrogen (T-N), and total phosphorus (T-P)) were utilized for developing the Deep ESN, ELM, GBRT, and RF with five input combinations (i.e., Categories 1–5). These models were evaluated by root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), coefficient of determination (R2), and correlation coefficient (R). Overall evaluations suggested that the Deep ESN5 model provided the most reliable predictions of BOD among all the models at both stations.

Introduction

Water quality in rivers, lakes, reservoirs, and oceans can be expressed through biological, physical, and chemical features of water (Khalil et al., 2011; Ahmed and Shah, 2017). The evaluation of water quality variables (e.g., dissolved oxygen (DO), biochemical oxygen demand (BOD), chemical oxygen demand (COD), nitrogen (N), phosphorus (P), total nitrogen (T-N), total phosphorus (T-P), and total organic carbon (TOC) etc.) is critical for operation and management of water resources systems (Najah et al., 2009; Khaled et al., 2017).

BOD was chosen as a sign of river pollution by the U.K. Royal Commission on River Pollution in 1908. The five-day period at a specified temperature (usually 20 °C) to estimate BOD5 was determined (Great Britain, 1908). The American Public Health Association Standard Methods Committee (APHASMC) identified BOD as a reference to assess the organic pollution of water in 1936. BOD can be expressed as the quantity of DO required by aerobic biological organisms to reduce the organic substance in water at specific temperature (Raheli et al., 2017; Solgi et al., 2017; Ahmadi et al., 2018; Tao et al., 2019). Most of the water quality variables, such as DO, COD, N, P, T-N, T-P, and TOC, can be measured by utilizing water quality equipment/kit directly at field scale. However, BOD can be estimated indirectly beyond field scale by utilizing the oxygen amount (milligrams) consumed per liter of sample during 5 days of incubation at 20 °C. Therefore, it is a difficult and major parameter to evaluate water quality in rivers, and is one of the measures for the maintenance of river ecosystems (Ay and Kisi, 2011). Accurate estimation of water quality variables can provide fundamental data to operate and manage the water quality systems (Xu and Liu, 2013). Traditional methods of prediction of water quality are iterative, requiring for a lot of time and effort (Zou et al., 2007).

To overcome the shortcomings of traditional methods for predicting water quality variables, machine learning approaches have been explored and reported in various articles during the past two decades (Maier and Dandy, 1996; Zhang and Stanley, 1997; Zhang et al., 2002; Diamantopoulou et al., 2005). Diamantopoulou et al. (2007) proposed the cascade correlation neural networks (CCNN) model to estimate the missing water quality variables. Zhao et al. (2007) applied the backpropagation neural networks (BPNN) model to predict the water quality indicators in reservoir. Dogan et al. (2009) explored the feedforward neural networks (FFNN) model in predicting BOD in the Melen River, Turkey. Utilizing the artificial neural networks (ANN) model, Najah et al. (2009) predicted the electrical conductivity (EC), total dissolved solids (TDS), and turbidity in the Johor River, Malaysia.

Han et al. (2011) utilized the radial basis function neural networks (RBFNN) model to predict the water quality variable in wastewater treatment. By utilizing the FFNN model, Gazzaz et al. (2012) predicted the water quality indices in the Kinta River, Malaysia. Li et al. (2019b) developed the support vector regression (SVR) model combined with firefly algorithm (FFA) for predicting water quality indicator in the Euphrates River, Iraq. Zounemat-Kermani et al. (2019) applied multilayer perceptron (MLP) and CCNN models to predict DO in St. Johns River, Florida.

Among different machine learning approaches, some specific methods have been utilized to predict BOD (Dogan et al., 2009; Fathima et al., 2014; Noori et al., 2015; Khaled et al., 2017; Raheli et al., 2017). Emamgholizadeh et al. (2014) employed the MLP and adaptive neuro-fuzzy inference system (ANFIS) models to predict BOD, COD, and DO in the Karoon River, Iran. Ahmed and Shah (2017) applied the ANFIS model for predicting BOD in the Surma River, Bangladesh. Solgi et al. (2017) utilized the SVR and ANFIS models combined with wavelet transform (WT) and principal component analysis (PCA) to predict BOD in the Karun River, Iran. Also, Tao et al. (2019) explored the hybrid response surface method (HRSM) and SVR model for predicting BOD and DO in the Euphrates River, Iraq. Granata et al. (2017) suggested the SVR and regression trees (RT) models to predict BOD, COD, TDS, and total suspended solids (TSS) in USA.

Although several machine learning approaches have been used to predict BOD, but these have their advantages/disadvantages and assumptions, new concepts are still needed to improve prediction accuracy. A new modeling strategy, deep echo state network (Deep ESN) model, which involves the capabilities of deep learning and an echo state network (ESN), has an ability of modeling highly complex and nonlinear relationships (Gallicchio and Micheli, 2017). These features of the Deep ESN model provide an impetus to investigate its ability for predicting BOD. To the best of our knowledge, this model has not been previously implemented for this issue.

This article explores the ability of Deep ESN model, for the prediction of BOD in the Han River, South Korea. The performance of the suggested model is assessed and compared with that of the ELM, GBRT, and RF models utilizing four statistical criteria (i.e., RMSE, NSE, R2, and R) and graphical comparisons (i.e., scatter diagram, error histogram, and Taylor diagram). This article is categorized as follows: The 2nd section expresses the applied approaches including the Deep ESN, ELM, GBRT, and RF models, respectively. Study area and information about data is provided in the 3rd section, and the 4th section describes application and results. Conclusions are finally summarized in the 5th section of article.

Section snippets

Materials and methods

Machine learning (ML) approaches applied in addressed research fall into two main categories of (i) neural networks approaches, including the Deep ESN and ELM models, and (ii) ensemble tree approaches, including GBRT and RF models. Following section described the applied models.

Study area and data

In current research, two stations (i.e., Gongreung and Gyeongan) were selected to predict BOD in Han River basin, South Korea. Gongreung station is located at Gongreung Stream with latitude 37°67′N and longitude 126°89′E, and Gyeongan station is located on Gyeongan Stream with a latitude 37°44′N and a longitude 127°31′E. The changes of BOD values were investigated using different water quality variables, including the potential of Hydrogen (pH), electrical conductivity (EC), dissolved oxygen

Application and results

This study used different water quality variables (i.e., BOD, pH, EC, DO, WT, COD, SS, T-N, and T-P) to predict BOD at Gongreung and Gyeongan stations in South Korea. As explained previously, evaluation for the performances of Deep ESN, ELM, GBRT, and RF models in predicting BOD is the key point of this article.

The water quality variables (i.e., pH, EC, DO, WT, COD, SS, T-N, and T-P) can be directly measured using small equipment at field scale. BOD, however, cannot be observed directly but

Conclusions

This research investigated the accuracy and reliability of novel machine learning model, Deep ESN, for predicting the biochemical oxygen demand. The outcomes of this approach were compared to the ELM, GBRT, and RF models using data of BOD and eight water quality variables (i.e., pH, EC, DO, WT, COD, SS, T-N, and T-P) collected from Gongreung and Gyeongan stations, South Korea. For the training and testing phases of these models, the data were separated into 80% (training) and 20% (testing),

CRediT authorship contribution statement

Sungwon Kim: Conceptualization, Data curation, Writing - original draft, Investigation, Validation. Meysam Alizamir: Methodology, Software, Visualization, Formal analysis. Mohammad Zounemat-Kermani: Methodology, Software, Resources. Ozgur Kisi: Writing - review & editing, Investigation. Vijay P. Singh: Writing - review & editing, Supervision.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (59)

  • G.B. Huang et al.

    Extreme learning machine: theory and applications

    Neurocomputing

    (2006)
  • S. Jouanneau et al.

    Methods for assessing biochemical oxygen demand (BOD): a review

    Water Res.

    (2014)
  • B. Khalil et al.

    Estimation of water quality characteristics at ungauged sites using artificial neural networks and canonical correlation analysis

    J. Hydrol.

    (2011)
  • H. Li et al.

    Data-Mining for processes in chemistry, materials, and engineering

    Processes

    (2019)
  • M. Ließ et al.

    Uncertainty in the spatial prediction of soil texture: comparison of regression tree and random forest models

    Geoderma

    (2012)
  • J.E. Nash et al.

    River flow forecasting through conceptual models, Part 1 – a discussion of principles

    J. Hydrol.

    (1970)
  • R. Noori et al.

    Uncertainty analysis of support vector machine for online prediction of five-day biochemical oxygen demand

    J. Hydrol.

    (2015)
  • C. Persson et al.

    Multi-site solar power forecasting using gradient boosted regression trees

    Sol. Energy

    (2017)
  • M. Rezaie-Balf et al.

    Daily river flow forecasting using ensemble empirical mode decomposition based heuristic regression models: application on the Perennial rivers in Iran and South Korea

    J. Hydrol.

    (2019)
  • Y. Seo et al.

    Daily water level forecasting using wavelet decomposition and artificial intelligence techniques

    J. Hydrol.

    (2015)
  • A. Solgi et al.

    Improving SVR and ANFIS performance using wavelet transform and PCA algorithm for modeling and predicting biochemical oxygen demand (BOD)

    Ecohydrol. Hydrobiol.

    (2017)
  • M.K. Tiwari et al.

    Development of an accurate and reliable hourly flood forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach

    J. Hydrol.

    (2010)
  • R. Walton et al.

    Estimating 2-year flood flows using the generalized structure of the Group Method of Data Handling

    J. Hydrol.

    (2019)
  • L. Xu et al.

    Study of short-term water quality prediction model based on wavelet neural network

    Math. Comput. Model.

    (2013)
  • Z.M. Yaseen et al.

    Predicting compressive strength of lightweight foamed concrete using extreme learning machine model

    Adv. Eng. Software

    (2018)
  • Q. Zhang et al.

    Forecasting raw-water quality parameters for the North Saskatchewan River by neural network modeling

    Water Res.

    (1997)
  • Y. Zhang et al.

    Application of an empirical neural network to surface water quality estimation in the Gulf of Finland using combined optical data and microwave data

    Remote Sens. Environ.

    (2002)
  • A. Ahmadi et al.

    Assessment of input data selection methods for BOD simulation using data-driven models: a case study

    Environ. Monit. Assess.

    (2018)
  • M. Ay et al.

    Modeling of dissolved oxygen concentration using different neural network techniques in Foundation Creek, El Paso County, Colorado

    J. Environ. Eng.

    (2011)
  • Cited by (37)

    • Applications of deep learning in water quality management: A state-of-the-art review

      2022, Journal of Hydrology
      Citation Excerpt :

      The ESN model is an RNN with a hidden layer defining the state dynamics of the network (Jaeger, 2007). Kim et al. (2020b) designed a deep ESN model that is comprised of a hierarchy of stacked recurrent layers for which the output of each layer will be fed into the next layer. The proposed deep ESN model has demonstrated lower prediction errors than the extreme learning machine (ELM), random forest (RF), and gradient boosted regression trees (GBRT) in the prediction of BOD.

    View all citing articles on Scopus
    View full text