A SOFT COMPUTING METHOD FOR EFFICIENT MODELLING OF SMART CITIES NOISE POLLUTION

Noise pollution is one of the most relevant problems in urban area. The main source of noise pollution is the number and type of motor vehicles, but other parameters depending on street configuration yield to a system hardly to be exactly modelled by classical mathematical methods. Smart cities are expected to dynamically control the urban traffic to reduce not just traffic jams, but also to ensure a comfortable noise level for inhabitants. This article gives a design method for efficient genetic fuzzy modelling of traffic generated smart cities noise pollution based on fuzzy logic, multi objective genetic algorithm, gradient descent optimisation and singular value decomposition in the MATLAB environment. Genetic algorithms with objectives to minimise the maximum absolute identification error, the root mean square of the identification error, reduce model complexity and ensure maximal numerical robustness are applied to Zadeh type fuzzy partition membership function parameters preliminary identification, and then gradient descent method is used for their fine-tuning optimization, while the fuzzy rule consequence linear parameters are calculated by singular value decomposition method to find the least squares optimal training data fitting of the model. The training data set is built from measured data, combined with carefully selected simulation data to ensure the completeness of the model and its numerical robustness. Detailed analysis of the method and results by computer simulation of the identification process show the validity of the proposed method.


INTRODUCTION
Noise pollution is an uncomfortable sound level obstructing the quality of life, typical to industrial road side areas with high traffic.Traffic jams and heavy traffic are highly annoying and obstructing not just because of time loss and increased engine exhaust air pollution effects, but also because of noise pollution.Intelligent, flexible traffic control is one of the popular subjects of smart city research.Solutions taking care of the life quality of resident people along roads do consider the noise levels.There are many passive construction design solutions to reduce the traffic noise starting from building protective walls, changing the road surface or planting appropriate vegetation between the road and houses [1].The bottom line is that the traffic nose primarily depends on two hard to change factorsthe street width and the house heights defining the geometry, the air volume which is subject to the third obvious factor, the noise source level itself, which is produced by the number passing vehicles.So a smart city traffic control system balancing the nose pollution has to primarily relay on balancing the number of vehicles passing by each fixed configuration street.The noise level depends not only on the number of vehicles, but also on their type.There are multiple levels of differentiations; the most basic is: cars, motorcycles and heavy vehicles [2].This paper presents a design method for efficient genetic fuzzy modelling of the traffic generated urban noise pollution.Fuzzy systems are known to be trainable for universal nonlinear function approximations even for as complex problems as multi-rotor copter flight dynamics modelling [3].By using Zadeh-type fuzzy partitions for the antecedent part we can guarantee the continuity of the fuzzy system output for the complete input space [4].Genetic algorithms are known powerful tools for global nonlinear search, thus suitable for efficient preliminary identification of fuzzy membership parameters [5] and also capable of fuzzy structure optimisation [6].We use an alternative multi-objective vector comparison in a dominance based ranking method scheme [7], since the two essential quality indicators of any function approximationthe maximum absolute error and the root mean square errorare independent properties usually competing with the model complexity, while all are to be simultaneously minimised.Once the proximity of the global optimum of membership parameters is found, we can rely on fast gradient descent methods to pinpoint the exact optimal values of membership parameters that minimise the approximation error.Using an appropriate representation of the Takagi-Sugeno-Kang type fuzzy logic system the calculation of fuzzy rule consequent linear free parameters becomes a high dimensional, but still a mere linear equation solving problem, which we can solve by a numerically robust, least square error optimal singular value decomposition method [8].
Analysing the singular values of fuzzy rule parameters we can also evaluate the quality of the used training data relative to the planned antecedent membership function fuzzy-cluster complexity, more even we can use this information to select only important samples from the training data, thus reducing its size, while maintaining the numerical quality of the solution [9].As the (for us) available measured data is by far insufficient for creating a universal noise level model [10], we include to our training set results by other models to obtain a well-defined robust complete fuzzy system of continuous output, no worse than the current known models, an also having the potential for seamless further quality improvements by training for any further real life measured data or any new model data.

NOISE POLLUTION MODELLING
Noise pollution sound pressure systems have very complex structure and therefore it is very hard to identify.Previous efforts [1,2] have been devoted to determining linear representations and linear combinations of the system parameter logarithmic values in the following general form: where   is the equivalent continuous sound level in decibel (dB),   denotes identification parameters, n eq is the number of or the logarithm of the number of equivalent vehicles per hour, X is a function of the observer distance from the noise source or W that is either the width or the logarithm of the width of the road, advanced models also account for H, the average height or the logarithm of the height of buildings in the considered road section, which is responsible for the sheer air volume where the noise is distributed, and also the refraction surface, further on some models also include dampening and amplifying modifiers as the road surface quality, the elevation of the road, distance from junctions and other vehicle velocity factors.The equivalent number of vehicles n eq , the most important factor for the noise level can be in general defined as: where n c is the number of passing cars, n mc of motorcycles, n hv of heavy vehicles, and c i are multiplication coefficients usually  1 is between 2 and 4,  2 is between 8 and 16.Such linear models are often used because they are easy to implement, albeit they allow only an approximate modelling of the real urban traffic noise pollution.For more accurate results in modelling real nonlinear systems it is advised to use nonlinear models [11][12][13].

NOISE MEASUREMENT
The set of measurements is performed on various street roads of a typical medium-size northern Serbia town of Subotica.The suitable method for determining the noise pollution is the measurement of the effective equivalent A-weighted energy-average sound pressure level, with the following approximation: ), where T is the observation time, the number of sound level samples to be accumulated, L i is the A-weighted sound level in dB(A), defined in the sound level meter standards IEC 60651, IEC 60804, IEC 61672, ANSI S1.4.The number of cars, motorcycles and heavy vehicles per minute was visually monitored in person, and manually counted.

FUNCTION IDENTIFICATION BY ZADEH-TYPE FUZZY PARTITIONS
For the noise pollution model identification method we use fuzzy logic systems (FLSs) with Zadeh-formed membership functions (MFs).More exactly we use Takagi-Sugeno-Kang (TSK) type FLSs having n=3 inputs and 1 output.These FLSs can be expressed as: where M is the number of fuzzy rules, x is the vector of n input variables,   is a scalar function of n input variables, and y is a linear function of inputs for the first order TSKs.Thus   is defined by n + 1 parameters, respectively.The antecedent, the premise part of a fuzzy rule is defined by MFs as: where    () (  ) is the membership function of the i th input variable in the l th rule that defines the linguistic value F l(i) .The human readable, intuitive linguistic form of the l th rule from the previously described first order TSK FLS is [14]: .
We can use fuzzy partitions defined by Zadeh-formed MFs.For  parameters  1 ≤  2 ≤  3 ≤  4 defining the MFs, which are non-linear second order polynomial, so called Z-, S-, and -functions, named after their shape, defined respectively as: For setups when more than one value x exists, such that the degree of membership of x is equal to one, the interval where the   (, ) = 1 (the interval [b 2 , b 3 ] for  type   ) is the so-called plateau of the   MF.When having for example three naturally ordered linguistic values l ∈ { ,  ,  } (for example  = 'low',  = 'medium',  = 'large') there are hard constraints on b i parameters to preserve the natural linguistic ordering: When a linguistic variable can be assigned K different linguistic values, each described by a MF   (, ) such that for every input x it holds that ∑   =1  (, ) = 1, the MFs are said to form a fuzzy-partition.By imposing restrictions (8) on all linguistic variables of the FLS, and assuming the rule base is completeit covers the whole input domainit follows that the TSK model structure (4) for a multi-input x vector simplifies to [15]: where c is the compound vector of all the so called linear c l and b is the compound vector of non-linear b l parameters (parameters defining the linear and non-linear sub-functions) of the fuzzy system and W is the appropriate compound coefficient matrix such that equation ( 9) holds.This way the resulting formula is simple, still guarantying uniform input coverage and continuous system output -TSK FLSs of fuzzy partitions can even be made periodic [3,18].

MULTI-OBJECTIVE GENETIC ALGORITHMS
A genetic algorithm (GA) is constructed on bases of imitating natural biological processes and natural Darwinian evolution.GAs are widely used as a search and optimisation tool [16].
Real-life optimisation problems often have multiple objectives.The comparison of two vectors in this case is not trivial.Multiobjective optimisation can be defined as the problem of finding a vector of decision variables which satisfies constraints and optimises a vector function whose elements represent the objective functions.These functions form a mathematical description of performance criteria that are usually in conflict with each other.Hence, the term "optimise" means finding such a solution that would give the values of all the objective functions acceptable to the designer [17].
Introduced in [5] and elaborated in [18] the idea behind the definition of a new vector comparison algorithm, named quality-dominance is to extend the Pareto-dominance relation in a way that a domination decision could be also made for vectors, which are not comparable by Pareto dominance, while a human heuristic would name a clear preference.Defining a dominance relation < q (r, s) (or briefly g < q s) between two vectors of n elements r = (r i ) and s = (s i ), for i=1..n, n ∈ ℕ + , where each i th element type has a well-defined scalar '<' (less than) strict partial order binary endorelation and also the equivalence relation '=' is defined.Defining a helper function # q< (r, s), which for vectors r and s defines two values (g r , l r ) = # q< (r, s), where g r , l r ∈ ℕ 0 and g r is equal to the cardinality of set G rs ={ r i | s i < r i }, i=1..n; and l r is equal to the cardinality of set L rs ={ r j | r j < s j }, j=1..n.For a minimisation problem vector r quality-dominates vector s, or briefly: r < q s if g r < l r or in case of g r = l r r qualitydominates vector , where i is such that   ∈   and j is such that   ∈   .In [5] a measurement value for < q (r, s) is defined as , for g  =   ,   (10) where i is such that   ∈   and j is such that   ∈   .
We use the dominance measurement based ranking method [18].At generation t the dominance measurement based rank of the i th individual r i t in a GA population, which dominates all s j t individuals in the current population is the i th individual current position, the individual's rank is defined as: is the sum of the dominated comparison measurements for every other s j t individual of generation t in correlation to the i th individual.
, where '*' can stand for any comparison method (the classical Pareto vector-, the new quality vector-, or the simple weighted sum scalar-comparison) [18].
The key question for a GA is the coding of possible solutions, which will evolve through several generations.Fundamental schemata theory suggests that small alphabets are good, because they maximise the number of schema available for genetic processing, so binary coding is implemented [16].To avoid Hamming cliffs Gray coding is used.Low probability (1 %) mutation is an important part of every GA.The chromosomes are simply the concatenated bit strings of all the parameters with fixed position for every gene, high probability (0.8) simple two point crossover will ensure low disruptiveness and high rate of inheritance during the reproductive phase.Stochastic universal sampling having minimal spread and zero bias is used for selection with a rather low (1.2) selection pressure.Continuous exploration of the search space is achieved together with consistent convergence by the combination of genetic operators in this manner [3-5, 7-9, 18].

SINGULAR VALUE DECOMPOSITION AND GRADIENT DESCENT BASED FUZZY MODELLING
We use the minimal universal representation of Zadeh partitions, by which it can be easily optimised without any constraints, the method as introduced in [3,4] and elaborated in [18].To represent a fuzzy partition of K membership functions we take K pieces of rational, positive or zero parameters   ∈  0 + ,  = 1, … ,  and form the b k nonlinear parameters of the Zadeh-type MFs forming fuzzy-partitions as: For cases when an   = 0 we obtain   = 0, thus the fuzzy partition structure complexity is reduced by one MF, thus the complete fuzzy system structure complexity, the number of rules are significantly reduced.In cases when there remains only a single   > 0 the fuzzy system rule degrades to a simple linear equation.Finally, in the case when all K pieces of   = 0 the fuzzy partition and the whole fuzzy consequent degrades to a constant, and the fuzzy system becomes indifferent to the reduced input channel; it becomes independent of the corresponding input variable.These   parameters are suitable for efficient stochastic global search as by GAs, and can be freely fine-tuned with any gradient descent-based method [3-5, 8, 18]; also the Jacobian of the FLS can be calculated with regards to   for advanced nonlinear least squares data fitting methods.
The proposed fuzzy-partition representation has been used in a multi-objective genetic algorithm [5,18].One chromosome consists of the number of MFs for every input and the corresponding parameters as in equation ( 11) for every possible MF partition of each input.The population size is selected to be 50 times the number of parameters.For parameter coding we use 16 bit binary Grey-coded chromosomes.The objective functions we have used are:  maxEthe maximum absolute error of the identification,  MSEthe mean squared error of the identification,  RuleNthe number of used fuzzy rules for the identification divided by the maximum possible number of rules for the selected design,  RankWis calculated from the matrix rank of (, ) of equation ( 9), which is normed to the theoretically maximal possible rank of the same FLS structure.
Based on the transformation of the TSK FLS equation ( 9) to the () = (, ) •  format, we split the identification into two problems: First part is the nonlinear problem of finding optimal (, (  )) values, which are defined as a function of system input vector  and the nonlinear MF   parameters of equation (7).By representing   as a function of   via equation ( 11) any numerical nonlinear unconstrained optimisation can be applied to   parameters.The second part, when (, ) is fixed (by selected optimal, or candidates for optimal b parameters) to constant values, of the problem is a 'simple' solution to a linear equation system, which is best solved for a given training data set of {input, output} sample vectors of {   ,   } time series in an error square optimal manner by the numerically robust, general SVD decomposition method as:  =  −1   •   for the SVD decomposition of   (  ,   ) =   .
In the applied GA chromosomes are evaluated through the following eight steps.1.)For each i of n inputs: K i , number of MFs of the i th input is decoded from the chromosome.2.) For each K i : corresponding   parameters are decoded from the chromosome.3.) All required parameters b of Zadeh-formed MFs of equation ( 7) that form fuzzy partitions are calculated as proposed in (11).4.) Using training inputs   , all possible training antecedents   (  ) are formed from are evaluated as in equation ( 5) to get   (  ,   ) of equation ( 9).5.)   (  ) , the corresponding fuzzy rule training consequent parts for all training numerical inputs are evaluated by SVD based LS method as: by the SVD of the compound antecedent matrix of   =   (  ,   ) and   is the measured training noise level.6.) parameters   of step 2.) are further optimised by a gradient decent based method for no more than ∑    =1 steps, where n is the dimension of the input space and K i is the number of used MFs.For every gradient descent iteration step 4.) and 5.) are repeated.
7.) The resultant fuzzy system () is evaluated by equation ( 9) for the test data set, which is disjunct to the training data setseparated randomly from available samples before the identification process startsand its cardinality is 10% of the training data set.8.) For the test data set the maxE, MSE, RuleN/maxRuleN, RankW/maxRankW of the identification is calculated, where maxRuleN = ∏    =1 , and maxRankW = (n+1) •maxRuleN.
To increase the efficiency of the GA after each evaluation the initial chromosome defining a fuzzy system used in step 1.) and 2.) is updated with its optimised MF parameter values resulting from step 6.).

FUZZY SYSTEMS TRAINING DATA SET QUALITY ANALYSIS AND REDUCTION
Training data set, a prerequisite for system identification is a set of measurements of system responses, outputs to be modelledf trainingwhile the system is being driven along a predefined trajectory of inputsx training .As this input training path must be sufficiently exiting so that all system output characteristics can be observed, it is natural that for a good quality identification we must operate with very large input-output training data sets.Evaluating models, iteratively optimising parameters along large training sets and especially performing SVD of large matrices is extremely time consuming.Thus, it is always a challenge for identification tasks to find sufficiently exciting, but not an oversized training data set.In case of modelling complex systems filtering out unnecessary samples, while still leaving all the necessary data for good quality identification is not a trivial strait forward process.In case of every identification problem, specific approaches are used when deemed necessary.
The condition number of a matrixin our case the compound antecedent data set   =  which defines a linear system of equationsin our case equation ( 9)is called the condition number of the equation; it is the ratio of the largest and the smallest singular valuein our case for S = [s ij ] it is (  ) = max(  ) / min(  ).The higher the condition number, the more uncertain the solution is; the more chaotic, the more sensitive the solution is to small disturbances of system parameters.The natural goal for a good quality, robust linear system solution is to have as small a condition number as possible.
A very well-conditioned linear system of equations has a condition number of three orders of magnitude less than the reciprocal of the numerical precision of our calculations.In case of double precision floating point digital computer calculous we can rely on a numerical precision of at least ~10 -16 , which means that any condition number < 10 10 will already result in a sufficient precision of up to six decimal places, but of course a < 10 3 condition number will be preferred.Rank deficiency of   is obviously out of question when the goal is to get a complete fuzzy system with no undefined rulesno undefined outputs for any input region.
For an identification error least squares (LS) optimal c vector one can use the SVD transformation property as in equation ( 12)  =  −   of the FLS antecedent matrix [18].
Considering the proof of the universal fuzzy approximation theorem, for a general case we cannot boundlessly 'generalise' our fuzzy model, we cannot reduce the number of MFsand by this the required rank of Wto an arbitrary low level, since by this we lose the universal function approximation quality of our FLS.On the other hand if we go for a too complex fuzzy structure, however large the measured training data set we capture, there is a point after which there will be no independent samples (rows of W will not be linearly independent), the training path is called 'insufficiently exciting', the training data is not fit for defining a complete TSK FLS based on fuzzy partition antecedents, the matrix rank of  of equation ( 9) will not be complete, it will be rank() , where n is the dimension of the input space and K i is the number of used MFs.Consequences of such an identification attempt is popularly called identification 'overfitting', where the model can only precisely match the training data, while it performs poorly on disjunct test data sets, which did not take part in the identification process.
The approach taken in this paper is to extend the real measured training data set of 120 samples (Figure 1.) with a carefully reduced set of 800 simulation values based on equations ( 1) and ( 2) of [1,2].The same approach can be taken for any further new measurement data or any further new reliable models, which we want to merge in, thus to re-train our model for a higher precision or even to include new input regions.

IDENTIFICATION RESULTS
The GA process convergence is presented along the identification error objective changes through 50 generations of evolution in Figure 2.
The training data had been extended to result in system output as presented on Figure 4., which also presents the overlapping output of a selected non-dominated solution and separately the corresponding training data identification error.
The expected system output, the system identification output, and the identification error along randomly selected test samples, which did not take part in the training process, are presented in Figure 5.

CONCLUSIONS
The presented identification method is capable of sufficiently precise modelling of real measured systems (which data has noise and measurement error) and approximating simulated system (no noise, no measurement error) models to an extreme precision.The definition of fuzzy partitions provides a guarantee that the resulting FLS uniformly covers the complete defined input space, more even the FLS structure guaranties its output is continuous for the complete input space.By extending the measured training data set with simulated values, we have ensured that our model output is nowhere significantly worse than a selected mathematical model, while it precisely approximates the measured data of the real system for all inputs that are in the appropriate range of the provided measured training data set.By extending the training data set and repeating the identification process our model is easily fine-tuned to match further real data.
This robust noise pollution model can be used for making traffic control decisions where the induced noise levels have to be confined.The controlled street configuration must be defined by the street width and average building height, the control input is the equivalent car number and the system model output is the resulting noise level.

Figure 1 .
Figure 1.Training data setmeasured system inputs of a) average building heights (m), b) average Street width (m) and c) number of equivalent vehicles per second.

Figure 2 .
Figure 2. GA convergence a) maximal absolute error evolution, b) mean square error evolution.

Figure 3 .
Figure 3. Zadeh type fuzzy partitions for the premise part of inputs [n eq , W, H].

Figure 4 .
Figure 4. Training data set outputs a) desired and identified output (dBA), b) training error (dBA).