Environmental odour management by arti ﬁ cial neural network – A review

Unwanted odour emissions are considered air pollutants that may cause detrimental impacts to the environment as well as an indicator of unhealthy air to the a ﬀ ected individuals resulting in annoyance and health related issues. These pollutants are challenging to handle due to their invisibility to the naked eye and can only be felt by the human olfactory stimuli. A strategy to address this issue is by introducing an intelligent processing system to odour monitoring instrument such as arti ﬁ cial neural network to achieve a robust result. In this paper, a review on the application of arti ﬁ cial neural network for the management of environmental odours is presented. The principal factors in developing an optimum arti ﬁ cial neural network were identi ﬁ ed as elements, structure and learning algorithms. The management of environmental odour has been distinguished into four aspects such as measurement, characterization, control and treatment and continuous monitoring. For each aspect, the performance of the neural network is critically evaluated emphasizing the strengths and weaknesses. This work aims to address the scarcity of information by addressing the gaps from existing studies in terms of the selection of the most suitable con ﬁ guration, the bene ﬁ ts and consequences. Adopting this technique could provide a new avenue in the management of environmental odours through the use of a powerful mathematical computing tool for a more e ﬃ cient and reliable outcome.


Introduction
Odour is a sensation produced by the biological olfactory system when exposed to air with volatile chemical species (Leonardos et al., 1969) such as hydrogen sulphides, ammonia, hydrocarbons, VOC's, etc. Meanwhile, the gaseous compounds responsible for transmitting odour are defined as odourants (Gostelow et al., 2000). Odour emission from industrial facilities can be the cause of annoyance to the people living in the surrounding area (Fig. 1). A long-term exposure may bring up serious damage to human health such as nausea, headaches and other related respiratory problems (Lebrero et al., 2011;Mudliar et al., 2010;Zarra et al., 2009b). This particular impact may lead to a poor quality of life and generate a perception of a risk to the community (Chemel et al., 2012;Zarra et al., 2009a). Odour emissions are considered air pollutants that required immediate attention and this is a long-overdue concern in the society as well as one reason of complaints to the operators (Sarkar et al., 2003;Szczurek and Maciejewska, 2005;Bindra et al., 2015;Shammay et al., 2016;Talaiekhozani et al., 2016;Liu et al., 2018). As a remedy, a comprehensive odour management program in the aspect of measurement, characterization, control and treatment and continuous monitoring can be implemented. In this way, the negative impacts can be minimized and allow the operators to function in an environmentally sound manner.
Odour emissions measurement has been a subject of extensive research over the past years because the magnitude of the problem can be determined at this stage. The consistency of the environmental odour management programs relied on the veracity of measurement. Different techniques are available to measure odour such as analytical, sensorial and mixed methods (Gostelow et al., 2000;Munoz et al., 2010;Zarra et al., 2012). Analytical measurement (e.g., GC-MS, colorimetric method, catalytic, infrared and electrochemical sensors, differential optical absorption spectroscopy and fluorescence spectrometry) are capable of identifying and quantifying only the concentration of the single or multiple chemical compounds but not the odour of a mixture (Gostelow et al., 2000;Zarra et al., 2014). In the case of GC-MS which is the most applied analytical method, this technique is expensive, timeconsuming and mainly limited to those cases in which the presence of noxious substances is suspected (Di Francesco et al., 2001). Sensorial techniques are characterized by using the human nose as the detector to evaluate the odour of a mixture. Between such technique, the widespread and mainly used is the dynamic olfactometry, applied according the EN13725:2003, at present under review by the CEN/ TC264/WG2. The mixed techniques associate the use of analytical instruments and sensorial responses. These techniques represent the tools with the highest potential future development in the field of the odour management Zarra et al., 2014).
To recognize the odour character, the European standard specifies a method for the objective determination of the odour concentration of a gaseous sample using dynamic olfactometry (DO) with human assessors. The unit of measurement is the European odour unit per cubic metre: OU E /m 3 . In a DO, the dilution is achieved by mixing a flow of odorous sample with a flow of non-odorous air. The odour threshold is identified as the lowest concentration at which 50% of the assessors would detect. Moreover, 'FIDOL' factors (i.e., frequency, intensity, duration, offensiveness and location) are used to assess odour nuisance impacts which include the natural or physical characteristics of quality of an area that contributes to individual's response to pleasantness .
In odour control and treatment, the first implementation step is to point out the responsible gas or gases as well as the concentration levels. These factors are the basis on what type of treatment method necessary to minimize the odour emission (Laor et al., 2014;Maurer et al., 2016;Brancher et al., 2017). Technologies limiting pollution with odorous compounds can be classified into two main groups: (a) for immobilization of the odorous compound from the stream of emitted gas and (b) for the prevention of emission (Wysocka et al., 2019). Moreover, the treatment approach can be considered dry or wet (Yang et al., 2016). Dry processes include adsorption, thermal oxidation and UV-light application while wet treatment includes wet scrubbing, biological treatment and photo catalysis (Burgess et al., 2001;Antonopoulou and Konstantinou, 2015;Yang et al., 2016;Wysocka et al., 2019). Dry process techniques posed thread to the environment due to operational risk involved because the system is dealing with products present at the haseous state, thus uncontrollable (Kachina et al., 2006). Meanwhile, applying wet treatment techniques is more feasible because the products can be dissolved initially in an aqueous solution and temporarily stabilized for further treatment (Yang et al., 2011). Among these techniques, wet scrubbing is the most applied process as odour abatement method (Charron et al., 2006;Yang et al., 2016;Maurer et al., 2016). Furthermore, in order to verify the efficiency of the treatment process, continuous monitoring on-site shall be taken into consideration by allowing the use of analytical instrument to quantitatively or qualitatively inspect the emission in the process, however, there are many restraint when deploying these instruments in the field.
The Electronic Nose (ENose) is taking the limelight due to its compact physical features which can be installed in a stationary or transferable manner. The ENose combines both functions of analytical and sensorial technique, can detect the existence of gas in the air, which makes it ideal for continuous in situ monitoring (Neumann et al., 2013;Deshmukh et al., 2015). It is composed of an array of gas sensors and pattern-recognition system, capable of distinguishing complex odours Capelli et al., 2007;Zarra et al., 2014;Szczurek and Maciejewska, 2005). Moreover, Enose is coupled with other ancillary (i.e., internet technologies, wireless fidelity, mobile platforms, smart sensors, biosensors, etc.) in order to obtain an on-line data acquisition system for a more reliable result. In addition, it can be either integrated with other monitoring system (i.e., meteorological instrument, remote controlled robots and drones, etc.) for the automation of labour-intensive measurements (Postolache et al., 2009;Szulczynski et al., 2017;Orzi et al., 2018). In some aspects, Enose is applied as a warning device and able to monitor all phases of industrial processes (Postolache et al., 2009;Wilson, 2012). The attention of the technical and scientific world is to resort a method that allows to reproduce the human nose, however, due to the different perception of odour from one person to another (Zarra et al., 2008), it is still difficult to identify and quantify it and there is no universally accepted method on how to manage environmental odour (Zarra et al., 2014).
In this situation, the artificial neural networks (ANN) represent an appropriate feature of pattern-recognition techniques that can be incorporated into an ENose to improve odour sensing. Artificial neural networks are miniature model of biological nervous system of the human brain that is made up of a large number of neurons, arranged in layered fashion, contains synaptic connections to each other with corresponding weights and changes its state in time individually (Kosinski and Kozlowski, 1998;McMillan, 1999;Mjalli et al., 2007;Theodoridis, 2015;Dharwal and Kaur, 2016). Artificial neural networks have nonparametric property and capable to handle background noise in the dataset (Kosinski and Kozlowski, 1998;Jones, 2004;Men et al., 2007). There are many approaches in applying neural networks in environmental process control (Chan and Huang, 2003;Iliadis et al., 2018) depending on the complexity of the problem. ANN in ENose provides a synergistic combination by detecting complex gas interaction resulting in an accurate measurement of odour concentration, intensity and hedonic tone (Sabilla et al., 2017;Szulczynski et al., 2018;Herrero et al., 2016). It is also a low-budget tool that could gauge the stability of the process (Mjalli et al., 2007;Kulkarni and Chellam, 2010;Deshmukh et al., 2014). The quality of service in the industry has significantly improved over the past few years because of the integration of artificial neural networks to every real-time pollution control and operation T. Zarra, et al. Environment International 133 (2019) 105189 processes (Chan and Huang, 2003;Adams and Kanaroglou, 2016;Pan, 2016).
In this study, a review on the application of artificial neural network for the management of environmental odours is presented. The principal factors in developing an optimum artificial neural network were identified as elements, structure and learning algorithms. The management of environmental odour has been distinguished into four aspects such as measurement, characterization, control and treatment and continuous monitoring (Fig. 2). For each aspect, the performance of the designed neural network is critically evaluated emphasizing the strengths and weaknesses.

The artificial neural network (ANN)
The past five decades witnessed the emergence of neural networks as an innovation in neuro-computing machines. The achievement of Warren McCulloch and Walter Pitts in 1943 is considered a breakthrough because the results of the developed computational model in terms of mathematical logic are tied with neurophysiology (McCulloch and Pitts, 1943;Yadav et al., 2015). The first golden era of neural networks took place in the 1950s to the 1960s, where Dr. Frank Rosenblatt (1959) coined the term "perception" and actualized it as a learning rule. Dr. Rosenblatt's neural models are networks of three types of signal processing units, namely; (1) sensory unit (input), (2) associative units (hidden layers) and (3) response units (output). These signal processing units are connected via synapse, which determines the strength of connections of neurons in the adjoining layer (Malsburg, 1986;Nagy, 1991;Sahoo et al., 2006). Furthermore, computer-based pattern recognition programs took the limelight escpecially the "nearest neighbor" algorithm in the year 1967. The 1970s were quiet years for neural networks and enthusiasm was revived in the early 1980s. In 1986, Geoff Hinton published a paper entitled "Learning representations by back-propagating errors" which inaugurated the use of multilayer perceptrons and rediscover backpropagation technique in machine learning research. As time goes by (e.g., 1990s), several innovations arise such as the used of support vector machines for classification and regression applications. In the doorway of the millennium, deep neural networks are introduced such as in image and voice recognition and other real-world interactions which using a large set of data. Almost 25 years have passed since the pioneering work of Rosenblatt when neural networks found application in numerous fields such as IBM's Watson, Siri-Google, Amazon Echo, etc. (Malsburg, 1986;Theodoridis, 2015). Fig. 3 depicts the time axis with the main milestones in the development of ANN.
Artificial neural networks are used in different type of problem solving applications (Nagy, 1991;Boger et al., 1997). Some neural network configurations are associated with statistical methods such as principal component analysis to standardize the data before training the network while some are working independently and rely with its own configuration (Koprinkova and Petrova, 1999). The application of neural network is mainly for pattern-recognition for prediction and classification problems (Shanker et al., 1996;Jones, 2004;Rahman et al., 2015). In order to perform a specific function, the artificial neural network is planted with knowledge and this can be accomplished by employing suitable learning processes which may be supervised, unsupervised and reinforced (Goodner et al., 2001). In supervised learning, the input data is known and labelled in accordance with the target output while in an unsupervised learning, the input data are unlabelled and the output is mostly a group of the data that represent an element. Meanwhile, reinforced learning is under study at the moment. The assignment of learning technique in artificial neural network model depends on the type of application (Lalis et al., 2014). With the rapid development of computational techniques, a powerful framework has been provided by artificial neural networks, which has potential application in supporting the operators to execute action plans (Fontes et al., 2014). Moreover, the continuous development of hardware resources and advanced computational technologies, wider attention and broader applications for neural networks is anticipated in the future .

Elements
The standard neural network consists of three layers (Theodoridis, 2015). Each layer has threshold elements, or "neurons". The first layer contains the input variables to be fed into the neural model. The selection of appropriate input parameters is a significant factor in designing artificial neural network and for the model to perform effectively (Mjalli et al., 2007;Dogan et al., 2009). This can be done in a  T. Zarra, et al. Environment International 133 (2019) 105189 series of trials by examining all the subsets of the input variables and selecting the best model which has the lowest Root Mean Square Error (RMSE) and highest correlation (R 2 ) (Mohammadi, 2017). Another way is by a partial rank correlation coefficient (PRCC) which is a sensitivity analysis method (Khoshroo et al., 2018). The method in the study of Mohammadi (2017) is computationally time demanding and most of the time, it is performed through manual optimization, while the method proposed by Khoshroo et al. (2018) required a numerous data set of input and output profiles. Applying any of these techniques can extract information on the rank of the most important input variable with regards to the desired output. The second layer or hidden layer (s) contains "associator units" of feature detector cells. The objective of this layer is to lessen the overlap between patterns. Number of neurons in the hidden layers are critical because few neurons can produce underfitting while exceeding the ideal number of neurons can lead to overfitting (Kermani et al., 2005;Mjalli et al., 2007;He et al., 2011;Janes et al., 2005). The optimum number of neurons in the hidden layer can be achieved by a "cut-andtry" approach by the designer.
The third layer is the "recognition cells", or the generated output. The neurons in the output can be composed of a single or multiple neuron depends on the application. The higher the number of output, the more task is needed to construct a strong pattern in the data set.
A connection (synapse) is the bridge between one neuron to another and it has corresponding weight value depending on the strength of the connections. The signals from the one neuron to another travels through the synapses (in mathematical terms, it's the sum of weighted inputs) and is transformed into an output signal via an activation function (McCulloch and Pitts, 1943).
In order for the output neurons convert the received signal to the output signal, an activation function is applied as a decision making function to introduce non-linearity Kessler and Mach, 2017;Nwankpa et al., 2018). Table 1 summarized the main popular activation functions used in neural networks highlighting their principal strengths and weaknesses. In the table, the standard formula is given as well as the range which refers to the maps of resulting values per function.
At the present the most popular activation function used are logistic sigmoid and hyperbolic tangent function (Theodoridis, 2015). Logistic sigmoid function is considered the oldest non-linear activation function used in neural networks which contains a non-negative or non-positive first derivative, both one local minimum and maximum and considered real-value and differentiable (Rojas, 1996;Chen et al., 2015). Although the logistic sigmoid function has a good biological perception, when using it with multiple hidden layers, the output are all positive and disturb the values of different magnitudes ranging between 0 and 1 resulting in optimization problems . Meanwhile, the hyperbolic tangent function is also sigmoidal but instead produces output values ranging from −1 to +1 = e e e e z z z z (Pushpa and Manimala, 2014). Despite of the simplicity of log-sigmoid, hyperbolic tanget is used due to its steady state at zero.
There is no general rule or theoretical reason in neural network in selecting the proper activation function since the goal of applying it is to introduce non-linearity (Suttisinthong et al., 2014), however, the success rate is directly linked to the optimal activation function (Ertugrul, 2018). Furthermore, some studies suggested that if the probability of the output values lies between 0 and 1, log sigmoid function is often used while when the probability of the output value is between −1 to 1, hyperbolic tangent function is applied (Theodoridis, 2015).

Structure
The simplicity or complexity of the network in accordance with the structure such as the number of neurons is an important factor in achieving effective generalizing capability (Kermani et al., 2005;Theodoridis, 2015), thus, the determination of the appropriate structure is challenging because every structure contains a unique set of optimal parameters. The collaboration of the neurons, especially the manner of how they were arranged and connected to one another directs the mechanism of computations ( Fig. 4) (Kermani et al., 2005). In a neural network, the model equation is long, thus, the developer takes note the topology and weight values instead when exporting to data processing platform.
Fully connected or layered neural network (LNN) is a linear operation in which the input nodes are linked to every output node by weight followed by a non-linear activation function. These links can be excitatory (positive weights), inhibitory (negative weights), or irrelevant (almost zero weights) that may vary from one node to another (Mehrota and Mohan, 1997). A feedforward neural networks (FFNN) otherwise known as multi-layer perceptrons (MLP) are designed to travel the given signal in one direction, especially from the input nodes, passing through a number of hidden layers and then finally to the output nodes without feedback connections to the preceding nodes (Dharwal and Kaur, 2016). Multilayer feed forward neural networks are the mostly applied and proven to be a useful tool for prediction, function approximation and classification applications (Gardner and Dorling, 1997). In a feedback or recurrent neural network (RNN), the • good when finer control is required over the activation range • computational time consuming due to the calculation of Euclidean distance (0,1) Sibi et al., 2013;Faqih et al, 2017 Rectified Linear Unit Function (ReLU)  connections between the nodes can generate a closed cycle, which is the distinguishing factor of RNN to FFNN. Loops are introduced to the system while travelling in similar directions considered to a have dynamic system (Lukosevicius and Jaeger, 2009). RNNs are powerful yet very difficult to train (Lipton et al., 2015). Modular neural networks (MNN) are a combination of structures in which small neural networks merge together to solve a problem. It is commonly made up of a different network hierarchy of specialized elements for different tasks. This could be a promising approach to eliminate local minima that are commonly encountered in large neural networks (Rojas, 1996;Tseng and Almogahed, 2009;Watanabe et al., 2017). Radial basis function neural networks (RBFNN) used the Gaussian function as activation function in a neural network (Faqih et al, 2017). RBFNN are designed to interpolate data within a multi-dimensional space that require as many centers as data points (Cheng et al., 2012;Dash et al., 2016). When the point from the center of every neuron has been identified, the Euclidean distance is calculated, hence, radial basis function is activated to the distance to compute the weight. RBFNN apply radial basis functions as activation functions in a neural network and the output value of the network is a linear combination of radial basis functions of the inputs and neuron parameters (Cheng et al., 2012;Aftab et al., 2014). The selection of most suitable structure varied in the degree of complexity of the problem (Gardner and Dorling, 1997). Overall, the LNN is considered the simplest and applicable to simple problems because training time is relatively fast, not like the multi-layer perceptron neural networks, which is widely applied to difficult problems commonly come with large network size, training consumes longer time due to the numerous parameters being optimized (i.e., weights, biases, etc.) (Kavzoglu, 1999). Table 2 summarized the main strengths and weaknesses of the principal structures of the ANN.

Learning algorithms
To implement the learning process, learning or training algorithms are used. Training represents minimizing the loss function (i.e., reducing the error between the actual and calculated result) after a successful optimization of weight values to every connection. The error term (loss function) evaluates how the data sets fit to the neural networks while there are cases that a regularizer is applied to avoid the over fitting of data (Srivastava et al., 2014). The relevance of this phase allows the neural network to optimize the system based on the available resources or in other words, it is a "road map" for accomplishing a task. Table 3 shown the most important algorithms used in artificial neural networks, highlighting their principal characteristics.
Gradient descent (GD), also known as steepest descent algorithm is a popular and most widely applied backpropagation technique (Haugen and Kvaal, 1998;Burney et al., 2007;Rahman et al., 2015). GD is a powerful and efficient method for finding the error derivatives with respect to the weights and biases as reported by Men et al. (2007). In real applications, gradient descent algorithms best fit for neural networks with large number of parameters, however, the rate of convergence is long and possibility to stick in local minima.
On the other hand, newton's method is the most avoided algorithm due to the complexity of computing a Hessian matrix during the operation. The goal of the Newton's method is to search the appropriate training directions (La Corte, 2014) by getting the second derivative of the loss function and computing its inverse which require large memory space. Meanwhile, to get rid of the tedious computations in newton method, quasi-Newton algorithm was introduced so that instead of computing the Hessian matrix, the approximation is performed at each iteration (Robitaille et al., 1996). Quasi-Newton is only relying on the information obtained in the first derivative of the function.
We can say that conjugate gradient (CG) algorithm is the intermediate between gradient descent and newton's method because it doesn't use information from Hessian matrix. The objective of CG algorithm is to improve the slow convergence rate in gradient descent and it has been proven to be more effective than gradient descent algorithm as reported by Burney et al. (2007).
Levenberg-Marquardt (LM) algorithm or damped least-square method designed to work accordingly with loss functions using the form on a sum of squared errors. Also, there is no hessian matrix involved in LM algorithm, instead it works with gradient vector and the Jacobian matrix (Lourakis, 2005). For large data sets, LM algorithm is not recommended because the Jacobian matrix increases the size of the program which consumes enough memory and space, therefore, LM is good only for few parameters.

ANN application in odour measurement
The measurement of environmental odours entails the quantification of the odour concentration in terms of OU E /m 3 while some studies dealt with odour intensity prediction. Numerous studies constructed a data set using the enose signals (input) associated with dynamic olfactometer results (target output) and established a pattern relationship that can be applied to prediction applications (Janes et al., 2005;Onkal-Engin et al., 2005). Janes et al. (2005) assessed the performance of ANN versus linear multiple regression (LMR) in modelling the pork farm odour emission. There were a total of 131 samples split into two parts; 105 data for training and 26 data for validation with odour intensity as the target output. Two different ANN structures were constructed (2-16-1; 5-16-1) by using a different number of input variables and compared with LMR. The ANN achieved R 2 of 0.83 and 0.81 respectively, while LMR has R 2 of 0.48. In the study of Micone and Guy (2007), a comparison between multilayer neural network (MLP) (optimal structure: 16-19-14-1) and radial basis function neural network (RNFNN) (optimal structure: 16-19-45-1) was reported. There were a total of 155 data sets applied (75% used for training; 25% used for validation) with a target output of odour concentrations ranges between 1 and 200 OU E /m 3 . MLP has higher accuracy in terms of lower learning error (1.26 × 10 −4 ), generalization error (2.30 × 10 −4 ) and prediction error (4.10 × 10 −4 ) than RBFNN. The two papers (Janes et al., 2005;Micone and Guy, 2007) presented different pathways in finding the optimum ANN by evaluating different number of neurons in the input and hidden layer and different training algorithms of the two structures (i.e., MLP and RBFNN). This demonstrates that as long as the results have good agreement with the reference methods, several approaches can be utilized.
Onkal-Engin et al. (2005) correlates the sewage odour emission with BOD using a feedforward neural network (FFNN) through the 12 electrical profiles obtained from electronic nose. A total of 99 data were used (66 for training; 33 for testing). Training and testing have a correlation of 0.98 and 0.91 respectively, and RMSE of 0.04 and 0.07 respectively. Table 4 resume the principal contents and results of these works.
The application of ANN for odour measurement provides strong correlations between input and output variables (R 2 ≤0.81). Statistical methods are struggling to overcome background noise in the data set (e.g., LMR), however, ANN was able to address that concern. ANN also demonstrated flexibility in dealing with the most available data (i.e., BOD relationship) (Onkal-Engin et al., 2005). The analysis affirmed the potential of ANN for a more accurate odour measurement.

ANN application in odour characterization
In odour characterization, numerous studies carried out ANN modelling by establishing a relationship between the set of input data to a multiple number of outputs. Initially, the responsible gaseous compounds in odour emission have been pointed out and developed a network with categorical outputs (Heredia et al., 2016;Rivai and Talakua, 2015) such as by grouping the electrical profiles from eNose that represent an element (Persaud and Dodd, 1982;Capelli et al., 2007;Viccione et al., 2012;Omatu and Yano, 2015). Table 5 summarized their contents and results. Rivai and Talakua (2015) developed an ANN model (topology: 3-20-3) to identify low concentration vapors with a support of a pre-concentrator. The aim of the experiment is to distinguish the sensor readings into ethanol (C 2 H 5 OH), benzene (C 6 H 6 ) and acetone (C 3 H 6 O) vapor groups. A total of 9 data sets were used during the training. The used of a pre-concentrator increased the sensitivity of the eNose sensors by detecting gases at 0.01 ppm and contributes to an efficient data gathering. Heredia et al. (2016) successfully categorized 10 different clusters of smell (i.e., Cluster 1: Decaying, Cluster 2: Chemical, Cluster  Janes, et al., 2005;Dharwal and Kaur, 2016;Lopez et al., 2017 Recurrent Neural Networks • good for modelling sequence of data (i.e., time series), thus, each sample is assumed to be dependent on previous data • difficulty of training due to vanishing gradient problem Lipton et al., 2015;Le et al., 2017 Modular Neural Networks • good in eliminating local minima in large networks (e.g., MLP) • complex network structure • longer training Tseng and Almogahed, 2009;Watanabe et al., 2017 Radial Basis Function Neural Networks • works better to noisy data set and function approximation problems • slow training process (i.e., classification) due to the calculation of Gaussian function to every neurons in the hidden layer Cheng et al., 2012;Aftab et al., 2014;Dash et al., 2016  • also known as steepest descent; • first ordered training method. Burney et al., 2007;Men et al., 2007 Newton's method • second ordered method; • used Hessian matrix.

La Corte, 2014
Quasi-newton method • composed of second partial derivatives of loss function; • compute the inverse of Hessian matrix. Robitaille et al., 1996 Conjugate gradient • intermediate between gradient descent and newton's method Burney et al., 2007 Levenberg-marquart • also known as damped-least squared method; • it works the gradient vectors and Jacobian matrix; • standard technique for non-linear least squares problem. Lourakis, 2005T. Zarra, et al. Environment International 133 (2019 105189 Table 4 Summary of papers that reports the application of ANN for odour measurement.  Micone and Guy, 2007 Determine relationship between sewage odour and BOD S1: 12-12-12-1; S2: 12-9-9-9-1 Gradient-descent NM Good pattern recognition for classification and online BOD detection (99% for classification, S1; 90% correlation for BOD, S2)  3: Citrus, Cluster 4: Fragrant, Cluster 5: Fruity, Cluster 6: Minty, Cluster 7: Popcorn, Cluster 8: Pungent, Cluster 9: Sweet and Cluster 10: Woody). The lowest and highest accuracy was found to be 77.68% and 99.95% respectively. Gulbag and Temurtas, 2005 applied feed-forward neural networks (FFNN) and adaptive neuro-fuzzy system (ANFIS) to classify trichloromethane (CHCl 3 ) and acetone (C 3 H 6 O) in a gas mixture and evaluated the performances of different training algorithms (i.e., Gradient Descent, Resilient BP, Levenberg-Marquardt, Fletcher-Reeves conjugate-gradient, Broyden-Fletcher-Goldfarb and Shanno quasinewton). Levenberg-Marquardt provided the fastest convergence and better results among the algorithms evaluated (MSE (training) = 1.00 × 10 −6 ; MSE (test) = 4.33 × 10 −05 ). The edge of the LM algorithm over back-propagation gradient descent is its efficiency in finding the local minima and suitability for training small and mediumsized problems. It is recognized in achieving higher performance, such as by having a better convergence thus making the training faster, however, it is challenging to make a proper estimate at the beginning for the parameters such as weight data in using this algorithm.
ANN successfully addressed the issue in defining a strong pattern that is critical when dealing with high numbers of output at the same time, to achieve high classification accuracy (classification rates; ≤80%). ANN exhibited a versatile property, whether the target output data is a real or a categorical value using the same data set.

ANN application in odour control and treatment
The scarcity of papers related to the odour emission during control and treatment is a shortcoming at the moment. This might due to the limitations of the dynamic olfactometer and other recognize methods for on-site deployment resulting in the absence of studies available online. However, this study dealt and proceed with the reports of other literatures that applied ANN in modelling the emissions of the identified responsible gases in an odour emission from air pollution control facilities. The principal contents and results of these works are summarized in Table 6. Yang et al. (2016) integrates RBFNN as PID control algorithm to control the effect of the system. It provides fast response speed, smaller overshoot and high robustness of the controller. Aghav et al. (2011) andLopez et al. (2017) designed feed-forward neural network to model the removal efficiencies (%) of certain gaseous compounds in air pollution control processes. Aghav et al. (2011) dealt with the removal of phenol (C 6 H 5 OH) and resorcinol (C 6 H 6 O 2 ) by using carbonaceous adsorbent materials such as activated carbon, charcoal and rice husk in bi-solute environment. Again, LM algorithm is successful than the steepest descent algorithm in terms of fast convergence rate (optimal structure: 5-8-2). The coefficient of determinations (R 2 ) which can be used to judge the efficiency of the model was found at 0.95 and above.
Meanwhile, Lopez et al. (2017) modelled the removal of methanol (CH 3 OH), hydrogen sulphide (H 2 S) and alpha-pinene (C 10 H 16 ) in a double-stage gas waste treatment system (i.e. bio-trickling filter (BTF) and bio-filter (BF)). The structure of the two (2) ANN models were "4-4-2" (BTF) and "3-3-1" (BF) and the R 2 values achieved ranges between 0.8955 and 0.9725. A successful ANN have been presented in the work of Rene et al. (2011) as well. Removal efficiency (%) of gas-phase styrene (C 8 H 8 ) was the target output from different input parameters collected in a bio-filter, continuous stirred tank bioreactor and monolith bio-reactor using Sporothrix Variecibatus fungus in biological waste gas treatment. As compared with the study of Aghav et al. (2011) andLopez et al. (2017), Rene et al. (2011) incorporated sensitivity analysis (i.e. Absolute average sensitivity (AAS)) and successfully ascertain the significance of input parameters and assessed their effects on the bioreactors.
The dynamic phenomena during gas phase makes ANN a convenient method due to its capability to deal with background noise in the data set and still capable to establish strong correlation. The result of their studies shows the possibility of utilizing artificial neural networks in the Table 6 Summary of papers that reports the application of ANN in odour control and treatment. T. Zarra, et al. Environment International 133 (2019) 105189 context of decision-making during the operational period of the bioreactors. ANN also does not impose limitations on the input variables which is very useful in modeling the transient-state performance of the bioprocesses in the treatment of gas-phase pollutants.

ANN application in continuous odour monitoring
In order to verify if the odour management programs are working efficiently, continuous odour monitoring is mandatory as well as to check if all the standard requirements for air quality regulations are being complied. Odour measuring instrument are also applied to monitor odour emission such as electronic nose (Bockreis and Jager, 1999;Sohn et al., 2003). Table 7 summarized the principal contents and results of the papers that reports the application of ANN for odour monitoring. Sohn et al. (2003) used data from eNose (Aromascan) and dynamic olfactometry to train the artificial neural network to build relationships between odour emission rates, pond loading rates and hydraulic retention time in a piggery effluent pond. The model has a good regression coefficient (R) of 0.98 during training, but lower coefficient of determination (R 2 ) during actual evaluation (0.59). The results indicate that the training has to be improved for the model to generalize well in terms of increasing the data set or further exploring other ANN structure. The study also applied "early stopping" method to reduce the training time under Levenberg-Marquart (LM) algorithm. "Early stopping" might work in some situations, but there are some alternate methods that are present in neural network to avoid over-fitting such as exploration of other algorithms because LM is limited only to small/ medium sized problems (Lourakis, 2005). In contrast to Sohn et al. (2003), Bockreis and Jager (1999) evaluates different types of training algorithms. The paper did not present information on the topology of the ANN model, but the impact of algorithms in recognizing data during training was emphasized. Training was applied with 27 samples at odour concentration ranging between 40 and 100,000 OU E /m 3 . By evaluating different algorithms, Bockreis and Jager (1999) able to show that ANN can have better learning, however, it doesn't guarantee that the ANN could generalize well without simulating a new set of data observations that were excluded from the training set.
ANN is a useful technique to predict the concentration of gaseous compounds (i.e., NH 3 and H 2 S) (Strik et al., 2005;Iliyas et al., 2013), odour intensity (Janes et al., 2005) and odour concentration (Bockreis and Jager, 1999;Micone and Guy, 2007) (R 2 ≤0.90). It can help to provide the operators' accurate information about the emerging of gases even at trace levels, maintain the operational conditions of the process, comply with air quality standards and eventually, to control and minimize these pollutants. The choice of ANN as computational model is suitable to a real-time odour monitoring instrument due to its ability to model linear and non-linear relationship, in fact, in actual scenario, most systems behave in non-linear motion which make it complex.

Conclusions
The issues concerning unwanted odour emitted into the ambient air must be addressed immediately because of the negative impacts and it have been a matter of public concern. At present, the human olfactory system is still considered the most accurate technique for detecting odours, which directs the technical and scientific community to explore a method that is patterned based on its mechanism. The area of artificial intelligence (AI) employed artificial neural networks (ANN) to be the mathematical or computational tools to achieve this goal.
Different studies present in the literatures applied ANN in environmental odour management, however, some reports still possessed shortcomings in terms of the selection of the most suitable ANN model and can be further improved. The present review addressed the shortage of information through the insights in terms of selecting the Table 7 Summary of papers that reports the application of ANN for odour monitoring. T. Zarra, et al. Environment International 133 (2019) 105189 most suitable configuration, the benefits and consequences of the neural network design. The management of environmental odour has been distinguished into four aspects such as measurement, characterization, control and treatment and continuous monitoring. As a result, the application of ANN provides robust results as compared with traditional statistical methods in terms of strong correlation (< R 2 ), lower residuals (RMSE) and high classification ratings. ANN was also flexible to the data set, especially when establishing a welldefined pattern despite of the presence of background noises. Meanwhile, the principal weakness of the ANN is its difficulty in optimizing the ideal parameters (i.e., topology, algorithm, the number of neurons in hidden layer, etc.) and it has to be done manually. Architecting ANN is time-consuming and required patience in the determination of the best topology because there is always a chance to over-fit. There is no general rule in the construction of a specific ANN because the intelligence or knowledge lies on the information that will feed by the developer as the "coach" of the program. From some standpoints of the researchers, the most applied configurations are: multilayer perceptron feed-forward neural network, gradient descent back propagating algorithm and log-sigmoid activation function.
The use of ANN provides a new avenue in the field of environmental odour management by addressing the accuracy issues of mathematical computing techniques due to its strength to analyse linear and nonlinear as well as complex systems which usually take place in the realworld. Its application to environmental odour management is destined for rapid and continuous growth over the coming years, also thanks to the progress of the technology and speed of data processing. The development of eNoses based on ANN will create new markets which are currently excluded by the limits of the traditional technology based on statistical methods. Advanced generations of eNoses based on ANN will open up a whole new range of applications not only limited to the environmental fields.