Improving the prediction of air pollution peak episodes generated by urban transport networks

This paper illustrates the early results of ongoing research developing novel methods to analyse and simulate the relationship between trasport-related air pollutant concentrations and easily accessible explanatory variables. The ﬁ nal scope is to integrate the new models in traditional traf ﬁ c management support systems for a sustainable mobility of road vehicles in urban areas. This ﬁ rst stage concerns the relationship between the hourly mean concentration of nitrogen dioxide (NO 2 ) and explanatory factors re ﬂ ecting the NO 2 mean level one hour back, along with traf ﬁ c and weather conditions. Particular attention is given to the prediction of pollution peaks, de ﬁ ned as exceedances of normative concentration limits. Two model frameworks are explored: the Arti ﬁ cial Neural Network approach and the ARIMAX model. Furthermore, the bene ﬁ t of a synergic use of both models for air quality forecasting is investigated. The analysis of ﬁ ndings points out that the prediction of extreme concentrations is best performed by integrating the two models into an ensemble. The neural network is outperformed by the ARIMAX model in foreseeing peaks, but gives a more realistic representation of the concentration's dependency upon wind characteristics. So, the Neural Network can be exploited to highlight the involved functional forms and improve the ARIMAX model speci ﬁ cation. In the end, the study shows that the ability to forecast exceedances of legal pollution limits can be enhanced by requiring traf ﬁ c management actions when the predicted concentration exceeds a lower threshold than the normative one.

Air pollution in urban areas is mainly due to the intense use of motorized transport for travelling, in particular private cars and heavy goods vehicles.This is a priority issue for transportation planners and public authorities, given the harmful effects of pollution to human health and the environment (Migliore et al., 2012;Bergantino et al., 2013).
Numerous studies (Heinrich et al., 2005;Zhang et al., 2012a) argue that acute exposure to air pollutants may cause serious health concerns such as eye irritation, breathing difficulty, cardiovascular problems, while chronic exposure may lead to damage to the body's immune, neurological, reproductive and respiratory systems, cancer and even premature death.In November 2014, the British Committee on the Medical Effects of Air Pollutants reported that air pollution may be responsible for as many as 60,000 early deaths in Britain each year.This follows the study presented in 2010 by the World Health Organization (WHO) Regional Office for Europe and the Organization for Economic Co-operation and Development (OECD), covering the whole European region, including non-EU states such as Norway and Switzerland.For this macro-area, 600,000 premature deaths each year are estimated as a consequence of air pollution from small particles (produced by the exhausts of diesel vehicles) and nitrogen dioxide, NO 2 (WHO, 2010).Also the environment is affected in terms of global climate change and adverse effects for plants and eco-systems (Seinfeld and Pandis, 2006;Zhang et al., 2012a).
Various national contexts throughout the world have issued guidelines and regulations to protect human health and the environment.The United States Environmental Protection Agency (EPA) has set national air quality standards for six pollutants: sulphur dioxide (SO 2 ), NO 2 , carbon monoxide (CO), ozone (O 3 ), lead (Pb) and particulate matter (Seinfeld and Pandis, 2006).In Europe, over the last decades, the European Union (EU) has adopted a range of environmental measures to improve the quality of life for the Community's citizens.The final step of this legislative process is the Directive 2008/50/EC (EU, 2008), which has integrated an extensive body of laws establishing health-based concentration standards for a number of pollutants in outdoor ambient air.The European Commission has the task of ensuring that the environmental law is applied by the Member States through infringement procedures.
Long-term measures like mode switch policies in favour of mass transit and public regulation on road use are pretty effective in abating atmospheric pollution in cities (Allen et al., 2011), but pollution peaks and the consequent exceedance of regulative concentration thresholds are often caused by substantial fluctuations of mobility patterns and weather conditions around their expected behaviours.Hence, air quality protection needs to be fine-tuned through the introduction, in the local policy portfolio, of further tools and actions for predicting extreme pollution events and managing traffic in real time, in order to prevent the predicted concentration peaks.
Given the above, this research focuses on the investigation of traffic-related air pollution with the final aim of developing enhanced mathematical models to support short-term decisions for sustainable mobility of road vehicles in urban areas.In more detail, the study addresses the challenge of forecasting accurately the concentration of NO 2 , which is subject to an hourly standard of concentration, to enable local authorities to mitigate or even prevent exceedances of concentration limits through real-time traffic management.
By achieving the identified objective, this research will be of strategic importance in many national contexts.The worldwide scale of atmospheric pollution problems has been acknowledged in the 2014 version of the WHO Ambient Air Pollution database consisting mainly of urban air quality data (WHO, 2014).In the report, annual concentration means of PM 10 and PM 2.5,2 for about 1600 cities of 91 countries in the 2008-2013 period, have been calculated.As can be seen in Fig. 1, the world's annual mean levels of PM 10 by region range from 26 to 208 mg/m3 ; in addition, the world's average is 71 mg/m 3 against the recommended value of 20 mg/m 3 .Particular concern is associated to the East side of the planet, where countries like China, India, Nepal, Bangadlesh, Mongolia and, in the Mediterranean Area, Egypt, Iran, Jordan, Afghanistan, Pakistan far exceed the world's yearly mean concentration of PM 10 .The 2014 Air Quality in Europe Report (European Environment Agency, 2014) states that, in EU cities, exposure to atmospheric pollution levels exceeding the WHO concentration limits (in general stricter than the EU standards) is significantly widespread for various chemical agents.In 2012, above limit exposure to PM 10 and PM 2.5 respectively involved 64% and 92% of the total EU-28 urban population.Moreover, in the case of O 3 , in the same year, the exposure incidence rose to 98% of people living in towns.Despite the clear decreasing trend of NO 2 yearly mean concentration over the recent years, 3 in 2011, 21 European countries still recorded exceedances of the limit values at one or more monitoring stations.Specifically, in the United Kingdom, the NO 2 levels have exceeded the WHO and EU target values persistently.This is confirmed by the fact that, in the early part of 2014, the European Commission launched legal proceedings against the UK for its failure to cut excessive levels of NO 2 (EU Press Release Database, 2014).Lastly, while exposure of Europeans to CO concentrations above the EU and WHO thresholds is negligible, in the case of benzene (C 6 H 6 ), around 10% of the EU-28 urban population is subject to pollution above the WHO levels and, in the case of SO 2, the value is 37%.
This paper presents the early stage of ongoing research on air quality modelling, which refers to NO 2 , a toxic gas emitted by road vehicles, industry and households, which, even in the case of shortterm exposures (from 30 min to 24 h), may irritate the eyes, nose, throat and lungs, while, in the long-term, may affect lung function permanently.Furthermore, it is the main precursor for groundlevel ozone, that is very harmful to human health.
For NO 2 , the EU environmental legislation (EU, 2008) sets two types of standard: the hourly mean concentration which cannot go beyond the level of 200 mg/m 3 more than 18 times each calendar year; whilst, the annual average of hourly concentrations is not allowed to exceed 40 mg/m 3 .Moreover, the 2008 Air Quality Directive also defines an 'alert' threshold value of 400 mg/m 3 .When this threshold is exceeded over three consecutive hours in areas of at least 100 km 2 or an entire air quality management zone (whichever is the smaller), authorities have to implement shortterm action plans.
In the research presented here, the relationship has been modelled to explore NO 2 hourly concentration in terms of explanatory variables which relate to urban transport and influence emissions, along with weather conditions, that are responsible for dispersion and transformation of pollutants.

Review of literature and research gaps
Few studies on real-time air quality forecasting near urban roads have appeared in the relevant scientific literature.Amongst some that are particularly interesting for this work, since they investigate the relationship between nitrogen oxides' levels and meteorological as well as transport-related variables, include Kukkonen et al. (2003); Cai et al. (2009); Galatioto and Bell (2013); Nagendra and Khare (2006); Perez and Trier (2001); Viotti et al. (2002).The leitmotiv of these studies is to consider the Artificial Neural Network (ANN), from the domain of Artificial Intelligence science, found to be the most effective tool to predict air quality in urban areas.In some cases, this methodology is compared with other approaches, but these are usually linear regression models or deterministic methods simulating the physical processes involved.Further statistical techniques have been adopted in relation to other pollutants.For example, Baur et al. (2004) compared the performance of quantile regression with multiple linear regression for predicting ozone concentrations.Kaushik and Melwani (2007) adopted the Seasonal Autoregressive Integrated Moving Average (ARIMA) model to forecast the daily levels of sulphur dioxide, nitrogen dioxide and suspended particles.Sayegh et al. (2014) evaluated different alternatives for the prediction of particulate matter concentration, namely multiple linear regression, quantile regression, generalised additive models and regression trees.
A promising area of research concerns the integration of the statistical approach into a fuzzy logic framework to handle uncertainty in environmental decision-making, as argued by Fisher (2006) for air quality forecasting.
For short-term predictions of pollution concentration, statistical methods are traditionally more suitable than deterministic models (physically-based), which are computationally intensive.Notwithstanding, one of the most important and current limitations of the statistical methods, including the ANN, is that, even when their overall performance is very good, they cannot accurately estimate concentration peaks (Zhang et al., 2012a;Galatioto and Zito, 2009;Zito et al., 2008).A solution to this problem may derive from ensemble forecasting, a multi-model approach integrating the predictions of diverse models,4 that is usually employed to reduce estimation bias without increasing sample variance significantly (Zhang et al., 2012b).The use of this technique to improve the accuracy of extreme pollution events has not been comprehensively investigated yet.
With respect to the research gaps motivating this research, it is worth stressing that few previous studies have compared and integrated the popular ANN with other complex statistical models, especially for the prediction of severe pollution episodes caused by urban transport.The work of Nunnari et al. (1998) tackled the problem of predicting concentrations of O 3 and NOx in a zone with a high density of industrial plants, the Priolo-Melilli-Augusta petrochemical area in Sicily, showing that the ANN outperforms the ARIMA model overall.Furthermore, Prybutok et al. (2000) worked on forecasting daily maximum O 3 concentration at Houston and showed that an ANN was more accurate than the ARIMA and linear regression models at forecasting extreme values concentration, mainly because the data presented clear non-linear patterns.Díaz-Robles et al. ( 2008) compared and combined the ARIMA and ANN models to improve the forecast accuracy of PM 10 concentration, with particular regard to alert episodes, for an area in Chile, where residential wood burning is a major pollution source during cold winters.Sánchez et al. (2013) used a time series of SO 2 concentrations, registered in a control station in the vicinity of a coal-fired power station in northern Spain, to develop and compare neural networks, ARIMA models and a hybrid method combining both which proved the best at predicting concentration peaks.
As suggested by the review of literature, further empirical evidence is needed to quantify the benefit of the proposed multimodel (ensemble) approach for extreme concentration forecasting, especially if the main source of pollution is road traffic.Therefore, in an attempt to address the identified research gaps, this research explores the popular neural network and compares it with a sophisticated parametric method, namely the Auto-Regressive Integrated Moving Averages with eXogenous inputs (ARIMAX) model (Hamilton, 1994).This is based on a set of indicators for missed exceedances and false alarms, in addition to the traditional metrics to measure the difference and the covariance between observed and simulated concentrations.
The ensemble approach has shown promise in its successful application to air quality modelling of monitored data (Bell et al., 2015;Díaz-Robles et al., 2008;Sánchez et al., 2013;Zhang et al., 2012b) and in other fields such as astronomy, astrophysics, early diagnosis of certain diseases, etc. (see Re andValentini, 2012 andVV. AA., 2008).Hence, the effectiveness of this approach for predicting exceedances of the NO 2 hourly concentration threshold set by the EU is assessed.

Case study
This section provides details of the study area including geometric characteristics and the technologies used to collect air quality and traffic data.The descriptive statistics and time series analysis techniques employed are presented along with results.

Air quality monitoring site
The study area is Marylebone road in the City of London (see Fig. 2).Marylebone Road has three lanes each way, with the nearside lanes in both directions reserved for buses and taxis.The traffic along the corridor is controlled by a demand responsive signal control system, namely SCOOT, Split Cycle Offset Optimization Technique (Hunt et al., 1981).The cabin housing precision air quality monitors is located on the south side of the road, at a point where the road is characterised by a canyon height to Width (H/W) ratio of 0.865 (see inset in Fig. 2).The reach dataset available over a ten-year period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007) contains traffic (flow and speed for each lane), meteorological conditions and air quality data at an hourly resolution.

Descriptive analysis of the dataset
The analysis seeks to establish the relationship between the hourly mean concentration of NO 2 and explanatory factors related to traffic and meteorological conditions in a central corridor of London, Marylebone Road, throughout the year 2007.Table 1 shows the main statistics describing the set of data collected in 2007 within the study-area.The average of all hourly NO 2 concentrations was high (102.5 mg/m 3 ); furthermore, the 200 mg/m 3 EU threshold was exceeded 457 times during 2007, against the legal limit of 18 times in a year.Such conditions may arise due to two main reasons: the high hourly traffic volumes which during the period of 7:00am to 9:00pm are more than 4000 passenger car units (pcu) per hour (about 76% of the road capacity) for 72% of time; the quasi-canyon at the monitoring site with an average building H/W ratio of around 0.9. 6his severe NO 2 pollution problem requires long-term policy to realise sustainable transport as well as effective air quality prediction models to support real-time traffic management interventions in the short-term.
Fig. 3 shows the relationships between the hourly concentraion (mg/m 3 ) of NO 2 at Marylebone road in 2007 and the explanatory variables taken into account: the hourly traffic volume (pcu/hour) and the hourly averages of wind speed (km/h), wind direction (degrees North) and temperature (degrees centigrade).In detail, it represents the relationship between the dependent variable and each independent attribute after controlling for the other explanatory factors.The various scatter graphs are based upon a preliminary regression of NO 2 concentration on the above transport and meteorological conditions related variables and, more precisely, plot each regressor against the relative partial residual, which is obtained by excluding the effect of the regressor itself from the estimated general regression residual (Mallows, 1986).The traffic flow has a positive impact on concentration, while wind speed is inversely related to pollutant concentration; moreover, the relationship between pollution and wind direction is seen to be complex and needs further investigation.On the other hand, the temperature effects appear small and, in the case of wind variables specifically, the influence on concentration appears to follow a non-linear behaviour.

Time series analysis
The time series of the considered variables have been analysed to check their stationarity, a statistical property of the seriesgenerating stochastic process which requires its mean and autocovariance to be date-independent.It is a regularity condition which enables the development of a statistical model to explain specific variable with its previous values and a set of exogenous predictors (Hamilton, 1994).Whether this property holds in our case-study has been checked through graphical inspections and formal statistical tests.
As Fig. 4 displays, the time mean and variance of the variables of interest are quite stable through the year, except for the hourly averages of temperature and vehicle speed.Temperature, consistent with summer/winter cycles, shows an increasing trend in the first half of the year and a decreasing one in the second half.In the vehicle speed case, there is a turning point in the time series behaviour in the early part of the year.This is consistent with a change in the speed limit.
As such the non-stationary input variables speed and temperature cannot be used to explain NO 2 hourly concentration, which is stationary.
The stationary nature of NO 2 concentration, traffic volume, wind speed and direction, clearly illustrated in their respective time series plots, has been confirmed by the Augmented Dickey-Fuller (ADF) unit root test (Hamilton, 1994).On one hand, under the most general null hypothesis, this test assumes that the model representing the true behaviour of a given time series y t can be formulated as a random walk with drift: where, y t : dependent variable (in our case, the hourly mean concentration of NO 2 ) at time t, assumed to depend upon its first order lagged value y tÀ1 by coefficient r equal to one (unit root), which implies non stationarity; a: drift term which makes y t as the sum of a linear time trend (linear function of time t) and a series of random impulses or mini-bus; Articulated Lorry; Bus or Coach.The measured flows for each category in passenger car units were obtained by following scaling factors (Lavecchia et al., 2007): 1 for motorcycles, cars and light vans; 1.5 for light vehicles including mini-buses (5 m < length < 7.5 m); 2 for Rigid lorries, buses and heavy vans (7.5 m < length < 12.5 m) and 3 for articulated lorries (length > 12.5 m).
e t : independently and identically distributed error terms with zero mean (white noise).
On the other hand, the most general alternative hypothesis of the test is the stationary behaviour of the time series around a deterministic linear time trend: where, Þ is the trend, specifically a linear function of time t; r is smaller than one, thus making the fluctuations of the series around its trend be the manifestation of a first-order autoregressive model.Since model (2) might not capture fully the underlying serial correlation, given that the autocorrelation could be second order level or above, the ADF test fits a transformation of model ( 2) through the Ordinary Least Square technique as follows: where, Dy t ¼ y t À y tÀ1 : first difference of variable y t ; g j Â Dy tÀj ¼ g j Â y tÀj À y tÀjÀ1 ; y t j ¼ 1; 2; . . .; k: additional elements to capture the likely presence of furhter serial correlation in Testing b ¼ 0 amounts to testing r ¼ 1 (Hamilton, 1994) or, equivalently, that y t follows a unit root process.This means that the time series is not stationary and, hence, should be made stationary.This is achieved by subtracting its first-order lagged value from each element of the series.
The test has been performed for the traffic volume and the hourly averages of NO 2 concentration, wind speed and direction.Several values of lag k (k = 1-24) have been tested and in each case the null hypothesis b ¼ 0 ð Þ has been rejected, thus pointing out stationarity of the above variables, as suggested by the preliminary graphical inspection.
Table 2 presents the test results for lag k = 24, which can take account of possible forms of autocorrelation till a day before the hour of interest.Since the series under consideration do not show a trend, but have a nonzero mean, under the null hypothesis, the value of the drift a ð Þ has been set at zero, while, under the alternative hypothesis, only the d Â t ð Þ term of regression (3) has been dropped.Table 2 shows that, in each instance, the test statistic is far smaller than the critical values at 1%, 5% and 10% significance levels, which makes the assumption of stationarity highly plausible.
In oder to specify a model for the time series of NO 2 hourly concentration, it is important to study its autocorrelation.This has been achieved through the visual analysis of the total and partial7 autocorrelation functions.In Fig. 5, the behaviour of total autocorrelation reveals a cyclical or seasonal effect: in fact, as the lag extends, the positive value of autocorrelation declines, but   with a reversal occuring whenever the time interval reaches a multiple of twenty-four hours.Importantly, the partial autocorrelation highlights a strong dependence of concentration upon its previous hourly value, whereas the direct relationships with past concentrations are negligible.
The cyclical effect can be interpreted considering that road mobility follows a cyclical daily pattern, which stems from people's routine and, hence, is incorporated into the behaviour over time of all the pollutants strictly related to transport, such as NO 2 .The positive correlation with the mean concentration of the previous hour, instead, is due to the fact that the pollutant emitted during an hour requires a certain time to be dissipated, so a fraction of it represents the background for the next hour.In fact, depending on meteorological conditions (namely wind speed) pollution can build up over longer periods, but their effects are second order and are in part stochastic over long period measurements.
Moreover, the background pollution in an hour contains not only what has been emitted and not dispersed during the preceding hour, but also what is emitted in the hour of interest by sources different from transport, including mainly residences and offices in the study-area of this work.The impact of changes in this element of the background on concentration can be likewise taken into account by the behaviour of average concentration in the previous hour.In fact, variations in the emission pattern of residential activities and offices are likely to span at least one day.8

Modelling approaches explored
This section details the modelling approaches in forecasting NO 2 concentration, namely the Artificial Neural Network and the ARIMAX (Autoregressive Integrated Moving Average with Explanatory variable) model, and presents the estimation results before illustrating the comparative analysis in Section 4. Both models belong to the statistics domain and use mathematical functions built on empirical data to predict concentration.Unlike the ARIMAX methodology, the ANN does not require specific assumptions on the mathematical relationships between the pollutant concentration and its explanatory factors.There is very little literature on this type of comparative analysis, especially for the prediction of extreme levels of air pollution due to road traffic in urban areas.
The comparison is based on the dataset of NO 2 hourly mean concentrations (mg/m 3 ) measured throughout the year 2006 at Marylebone road in London.The variables to explain the state of concentration are representative of weather and traffic conditions during the hour of interest: traffic volume (pcu/hour), hourly mean of wind speed (km/h), hourly mean of wind direction (degrees North).Besides, either models embody the double form of autocorrelation emerging from the analysis of correlograms (Fig. 5), which is the dependency of concentration upon the previous one hour and one day values.Further investigation will test the possibility of multiple step ahead forecasting of concentration (e.g. two or three hours in advance), since pollution measurements need to be scrutinised by air quality authorities before being used to initiate abatement measures which cannot be fully effective if implemented shortly before an anticipated pollutant event.
As a consequence, to use these models for one step ahead concentration forecasting, traffic flow and wind characteristics have to be foreseen on an hourly basis.For the former to be predicted, assignment of historical hourly origin-destination matrices improved with real-time traffic counts can be used to estimate the future road network conditions.In the case of wind, instead, short-term predictions are largely available and can be obtained by the providers of weather forecat (Met Offices), that regularly process the meteorological data collected across a country.
The following subsections illustrate the concepts underpinning the two approaches and the frameworks of the specific models derived from the data.

Neural network
The neural network is an artificial intelligence technique that mimics the human brain behaviour in a learning process.The important feature of an ANN is its adaptive nature and "learning by examples" is the logic used by this method to accomplish classification and regression tasks.This makes neural networks virtually applicable to every situation in which the relationships between a response variable and its predictors are very complex and cannot be easily outlined based on a priori knowledge and theoretical considerations (Migliore and Catalano, 2007).
The Multilayer Perceptron (MLP) architecture has proved to be the most suitable class of neural networks for air quality forecasting by previous studies (Nagendra and Khare, 2002).It consists of a system of layered and interconnected nodes or neurons.Nodes within the input layer and one or more hidden layers are connected to all nodes in neighbouring layers (Bishop, 1995).The input neurons provide signals to the hidden layer, where each node sums the inputs, processes the result with a nonlinear transfer or activation function (logistic or hyperbolic tangent) and then distributes it to the output layer.The latter computes the dependent variable value in a similar manner.Each neuron-to-neuron connection is associated with a specific weight.MLP has the ability to learn through training, which requires a series of input vectors and associated outputs.During training, the output from the MLP is compared with the desired value, an error signal is propagated back through the network and the magnitude of this error is used to adjust the weights iteratively until a stopping criterion is met.Besides a set of observations for training (the training set), an additional set of new data, namely the validation or selection set, is used to limit the network complexity in favour of its generalization performance, which is the prediction accuracy with respect to unknown cases.In fact, during a typical network learning session, the traning set error generally decreases as a function of the number of iterations.On the contrary, the error measured in relation to independent data (validation set) often shows a decrease at first, followed by an increase as the network starts to over-fit.Halting training at the point of smallest error within the validation set should achieve the best degree of the network generalization.
It is common practice in the application of neural networks to train many different candidate architectures and then the best is selected on the basis of the performance obtained with the validation/selection set.However, a selection based only on the validation set could be misleading, because the network performance has a random component due to noise in the data.Therefore, the choice of a specific network structure should be confirmed by its performance within a third independent set of data often refer to as the test set.
Fig. 6 illustrates conceptually the neural network framework that has been selected to simulate the hourly average of NO 2 concentration (mg/m 3 ) in 2007 at Marylebone road: an MLP with five inputs, seven hidden neurons and one output.The independent variables are the previous one hour and one day mean concentrations along with non-lagged predictors: the traffic volume (pcu/hour) and the hourly averages of wind speed (km/ hour) and wind direction (degrees North).The model design has been performed through a number of trials with different architectures (three and four layer MLP networks) and the network with the best selection set performance was chosen.At each learning round, a resampling method was used to form the training and selection sets in order to make the candidate networks as diverse as possible.More specifically, whilst fixing the test subset throughout the training procedure, the Monte Carlo technique was adopted to draw at random the elements of training and selection subsets from the whole database.Weight regularization (Bishop, 1995) was performed to decide the number of hidden units adding an extra term to the error function for penalizing and reducing those weights that give a small contribution to the network performance.This procedure prunes away entire hidden units when their fanning-out weights are below a fixed threshold, thus limiting the curvature of sigmoid activation functions and improving the network generalization skill.
Table 3 reports the Pearson-R correlation coefficient for the best model predictions and the performance value for each subset of data (train, selection, test).The high correlation reveals the ability of the neural model to reproduce the historical behaviour of pollutant concentration levels.Every subset-specific performance indicator is computed as the ratio of standard deviation of the prediction error to that of the specific subsample of data.A ratio substantially below 1 indicates that the network has far better than a simple mean estimator and its use in forecasting is then justified.As Table 3 illustrates, for the chosen network, the ratio for each sub-component of the total dataset is quite low, which indicates a good level of performance.In addition, the selection performance is better than that for the training set, which means that the network has not over-learned.Fig. 7 displays the response surfaces plotted in relation to two different pairs of input variables.The response surface is constructed to show that the two selected inputs vary from their observed minimum to their observed maximum value, while all other inputs are held at fixed values (their averages).In one case, the graph (left) visualizes the dependency of the NO 2 hourly concentration upon its lagged values showing a strong positive impact of the mean concentration in the preceding hour and a slight effect of the corresponding hourly concentration in the previous day.
In the other graph (right), the relationship between concentration and wind characteristics is clearly non-linear: as the wind speed increases, concentration falls at a decreasing marginal rate, whereas the dependency upon the wind direction is represented by a U-shaped curve.
This quantifies the influence of the quasi-canyon form of Marylebone road.The part of the road section just off the monitoring device presents an H/W ratio9 close to 1, which according to Tartaglia (1999) indicates the possibility of a street canyon behaviour of transport-related pollution.This means that, when the wind speed is at least 1 m/s and its direction with the road normal axis has an angle in the À45 to +45 range, a vortex is generated inside the street canyon and the pollutant concentration on the windward side is far higher than that on the leeward side.Marylebone road has a Southwest-Northeast allignment and the pollution receptors are located on its southern boundary (see Fig. 2).It follows that, when the wind comes from the South, the set of sensors is on the windward side.Therefore, as the wind direction moves closer to 360 (or zero) becoming more northerly, an increasing pollution concentration is observed.On the contrary, if the wind comes from North, the sensors are on the leeward side, hence the measured concentration should decrease as the wind moves closer to 180 North.The fact that the trained neural network is able to simulate and therefore assess the influence of this complex behaviour of pollution adds credibility to the ANN as a valid and valuable statistical tool to understand the effects of traffic related air pollution in our urban areas.
In the end, Fig. 8 shows that increasing traffic volumes, as expected, have a positive impact on concentration according to a mathematical relationship slightly deviating from the linear

Statistical model
The ARIMAX model was developed independently as an alternative approach to predict NO 2 hourly concentration at Marylebone road.This postulates an underlying stochastic datagenerating process and represents it as the integration of two main components: one captures the relationship between the dependent variable and its past manifestations (autoregressive part), the other incorporates the effect on the dependent variable of a finite series of random impulses (moving average part).Formally, an ARIMAX model, with p autoregressive (AR) elements and q moving average (MA) terms, can be written as follows: or, more succinctly: where, r j : coefficient of the j th -order AR element; u j : coefficient of the j th -order MA element; L j : lag operator transforming a variable at time t into its j th -order lagged manifestation L j z t ¼ z tÀj ; y t : dependent variable at time t; x 0 t : vector of exogenous covariates at time t; b: vector of coefficients; e t $ N 0; s 2 À Á , meaning that it is a white noise distrurbance; If the considered series shows a cyclical behaviour, model (5) can be turned into a seasonal ARIMAX or SARIMAX model by the introduction of multiplicative terms both for the AR part and for the MA component (La Franca et al., 2010): where, the seasonal symbols (those with subscript s) have the same meaning as their non-seasonal counterparts, but apply to the series every s units of time (hours in our case).
The SARIMAX model for the 2007 time series of NO 2 concentrations at Marylebone road has one AR term and one MA element both for the non-seasonal part and for the seasonal one, the latter being based on a periodicity of 24 h (recall subsection 2.3).The exogenous regressors are again the traffic volume (pcu/hour), the wind speed (km/hour) and the wind direction (degrees North).Under the base formulation, no prior knowledge on the shape of their influence on concentration is postulated and, therefore, they are assumed to be linearly related to the dependent variable.
In view of the above, formula (6) can be specified as follows: Table 4 exhibits the results of the SARIMAX model estimation, conducted by the maximum likelihood technique and using the Kalman filter via the prediction error decomposition (Hamilton, 1994).This approach handles missing data (for the dependent variable and/or its covariates) by continuing the state-updating recursions of the Kalman filter even if the contribution to estimation from the sample is partial or even null.All the estimated parameters have correct signs with high statistical significance except for the constant.The main limitation of the estimated SARIMAX model is its inability to reproduce the nonlinear relationship between concentration and the wind characteristics, with due consideration of wind direction.Therefore, the resulting linear impact on pollution is very far from the more plausible parabolic behaviour revealed with the neural network and explained as a street canyon effect.
In line with the Box and Jenkins' guidelines (Box et al., 2008), the SARIMAX residuals have been analysed to check whether they can be reasonably considered Gaussian and independently distributed as required by the theory.Fig. 9 shows the frequency distribution of residuals along with their autocorrelation function for the first 24 lags.As can be observed, the assumption of a white noise behaviour of residuals is supported by the empirical evidence.

Comparative analysis
This section compares the two forecasting models in relation to a sample of observations which have not been involved in the building process of each of the above described models.The ANN with the SARIMAX comparison is based on three criteria, represented through statistical and categorical metrics: the extent to which observed and simulated concentrations covary; the global forecasting accuracy and the ability to alert in advance against the exceedances of the legal concentration threshold.Finally, to explore the potential advantage of ensemble forecasting, the predictions of the two models have been combined and the resulting performance has been evaluated.

The dataset for comparison
The ANN and SARIMAX modelling approaches are compared based on out-of-estimation sample data, collected in 2006.In this period, the annual average of NO 2 hourly concentrations was very close to that calculated for the year 2007 (110.6 versus 102.5), but the number of cases in which the 200 mg/m 3 concentration was exceeded is 1.5 times greater (686 versus 457).As emerges from Table 5, in comparison to Table 1, the base statistics describing the

Evaluation of model forecasting performance and sensitivity analysis
The ANN and the SARIMAX models have been studied with respect to the ability to predict the NO 2 concentrations recorded at Marylebone road in 2006.In order to provide initial conclusions on the potential benefit of ensemble forecasting, the two models have been integrated with a mixed method taking, for each case, the maximum of the concentrations predicted by both of them.The choice of the max functional form for the ensemble is motivated by the tendency, for both models, to underestimate pollution peaks.Indeed, within the set of cases in which the EU threshold hourly concentration (200 mg/m 3 ) is exceeded, the mean ratio of the predicted concentration to the actual one is 0.89 if the SARIMAX is applied and 0.86 if the ANN is used; it becomes slightly higher (0.92) if the two models are merged into an ensemble.Fig. 10 visually outlines the distributions of measured and simulated concentrations.The graphic tool employed is the box plot, which represents the 25th and 75th percentiles with the bottom and top lines of the box, respectively; the difference between these two percentiles is termed interquartile range.In addition, the line intersecting the box identifies the median of the distribution, while the two whiskers provide a measure of the data's spread. 10The isolated points outside of the spread represent outliers.As can be seen in the picture, compared to the measured all models reproduce the observed dispersion of concentrations quite well.Also, as displayed by the upper whiskers, all models clearly underestimate the high levels of concentration, even though the ensemble performs best.
Table 6 compares all models through five performance indicators.Two statistical indices measure the linear correlation and the mean absolute percent difference (MAPE) between observed and estimated concentrations.The values of these indicators show that, overall, the three models do not differ significantly and all can be deemed fairly effective.Moreover, two categorical indices are used to evaluate the models' ability to alert in advance against the exceedance level of the 200 mg/m 3 limit.
One indicator is the percentage of the observed illegal 10 This ranges from 1.5 times the interquartile difference above the 75th percentile to 1.5 times the interquartile difference below the 25th percentile.concentrations forecasted by the models as concentrations over an alarm threshold (just 200 mg/m 3 ).From this point of view, the ensemble approach yields a rather better performance, since the max operator lowers the tendency to underestimate extreme values of pollution.
The fourth indicator is the share of the predicted concentrations beyond the chosen alarm bound (alarms) that do not correspond to real concentrations above the legal limit of 200 mg/m 3 (false alarm ratio).All models present good levels of this performance measure with a percentage of false alarms varying in the 30-37% range.
The fifth indicator is the actual level of mean concentration in the case of false alerts, which can be viewed as a measure of their social benefit.For each model, Table 6 shows that, when a false alarm is generated, the corresponding real concentration is quite high on average (around 180 mg/m 3 ), which makes such an alarm useful anyway from an air quality protection perspective.Another advantage of these wrong alerts is their contribution to the lowering of annual mean concentration, which is far beyond the EU target of 40 mg/m 3 in the study-area.
Since the models underestimate pollution peaks, their ability to foresee the need for traffic management could be enhanced by establishing an alarm threshold smaller than the legal limit of 200 mg/m 3 , but sufficiently high, say 180 mg/m 3 , to initiate a traffic control intervention.In this way, under the new approach, the traffic measures are triggered in the 180-200 mg/m 3 range of predicted concentrations, building-in a degree of leeway to ensure that the target (concentration < 200 mg/m 3 ) is respected.
Table 7 endorses this assumption showing that the percentage of actual illegal concentrations foreseen by the models as "alarming" concentrations hugely improve by as much as 22 points, at the expense of an expected, but acceptable, rise in the incidence of false alarms (from 30 to 37% to 45-51%).What is more, the mean concentration when a false alarm is issued remains high (170-173 mg/m 3 ), which guarantees a useful traffic management intervention.
In terms of the share of actual exceedances captured by the models, results could be improved further by lowering the alarm threshold further to the detriment of the false alarm ratio; however, the value of 180 mg/m 3 leads to a satisfactory balance between the two conflicting evaluation indices.
Figs. 11-13 compare the behaviours of measured and simulated concentrations during the validation period for each of the three models in turn.The scatter plots show similar and high levels of linear correlation between observations and forecasts for all models, while the line plots (on the left) show that the ANN has the worst performance in predicting pollution peaks.This can be explained considering that, on one hand, the non-linear patterns in the dataset (the influences of wind speed and direction, in particular) have not probably as strong an impact on pollutant concentration compared to the other predictors (previous hours concentrations and traffic volume), which are linearly related to NO 2 pollution. 11On the other hand, the training of a neural network does not use the entire available database, because the test subset of data (containing in this application as many as 1712 observations) is not employed in the learning process.The probability of occurrence of higher pollution events is smaller within the smaller sample.
A sensitivity analysis was performed on the influential factors, namely traffic volume, wind speed and wind direction, in order to check and compare the robustness of the ANN and SARIMAX forecasting models.As these are likely to be subject to measurement and prediction errors, the models' forecasting performance has been evaluated under two scenarios: 1. a "Low Scenario" perturbating the explanatory variables' levels, observed in 2006 at Marylebone road, sampling randomly from a uniformally distributed error that varies from À5% to +5%; 2. a "High Scenario" based on a À15% to +15% uniformally distributed error.
The error range for each scenario has been set considering that short-term predictions of traffic volume and wind characteristics are fairly reliable in the current applications of traffic and weather forecast.
Table 8 compares the performance indicators for each scenario with those derived from the original dataset considering an alarm concentration threshold set at 180 mg/m 3 .Given the small differences in the performance measures, the two models are robust to the measurement and prediction errors that are likely to occur in real world. 11This is confirmed by the limited range of variation of the U-shaped function in Fig. 7.   Table 8 Sensitivity analysis results when the alarm threshold is set at 180 mg/m 3 .

Conclusions and future steps
This reserach investigated the relationship between the hourly mean concentration of NO 2 and explanatory factors reflecting the NO 2 mean level one hour and 24 h back, along with traffic and meteorological conditions.The final scope was to develop novel and more effective models for hourly pollution peak event forecasting as a statistical analysis method to be incorporated into traditional traffic management decision support system to deliver sustainable mobility in urban areas.
The results demonstrated that the neural network can represent the non-linear relationships between hourly concentration and meteorological conditions although its ability in predicting exceedances of the legal concentration limit was less promising.Therefore, ANN has an important role to play as an effective tool to analyse the non-linear processes and support the specification of other and more accurate statistical models such as the SARIMAX model, whose specification requires prior knowledge or theoretical assumptions.
Also, this initial study demonstrated that the ensemble approach is very promising for the prediction of extreme pollution events.It represents an interesting area of research that deserves further investigation.In particular, an appealing issue is the influence on the performance in forecasting pollution peaks of the number and type of models in the ensemble.
Finally, the ability of the models to alert against exceedances of the legal concentration limit by fixing an alarm threshold lower than 200 mg/m 3 was demonstrated.This improvement occurs because such an alarm bound compensates for the tendency of models to underestimate extreme levels of NO 2 pollution.This additional alarm threshold was shown to be a good compromise between warning performance and incidence of false alarms.
Further research will regard the prediction of low frequency concentration peaks as well as the high levels of pollution deriving from exceptional conditions, such as city events (football matches or exhibitions) that attract many visitors and modify the patterns of urban mobility significantly.In such cases, previous studies (Stockwell et al., 2002) have highlighted the limitation of traditional site-specific statistical models.

Fig. 2 .
Fig. 2. The study area, Marylebone road in London, indicating the location of the air quality monitoring site.

Fig. 3 .
Fig. 3. Scatter plots depicting, for the set of hourly observations recorded in 2007 at Marylebone road, the relationships between the NO 2 mean concentration (mg/m 3 ) relative partial residuals and the following explanatory variables: traffic volume (pcu/hour), wind speed (km/h), wind direction (degrees North) and temperature (degrees centigrade).

Fig. 5 .
Fig. 5. Graphs of the total and partial autocorrelations for the hourly average of NO 2 concentration (mg/m 3 ) in 2007 at Marylebone road.

Fig. 6 .
Fig. 6.Neural Architecture to forecast the hourly average of NO 2 concentration (mg/ m 3 ) in 2007 at Marylebone road.

Fig. 7 .
Fig. 7. Response surfaces describing how the trained MLP neural network simulates the dependency of the NO 2 hourly concentration (mg/m 3 ) upon its lagged values one hour and one day back (left) and the wind characteristics, namely speed in km/hour and direction in degrees North (right).

Fig. 8 .
Fig. 8. Response line describing how the trained MLP neural network simulates the dependency of the NO 2 hourly concentration (mg/m 3 ) upon traffic volume (pcu/hour).

Fig. 10 .
Fig. 10.Box Plot of NO 2 concentration time series for measured and modelled in 2006 at Marylebone road.
Fig. 9. Analysis of the SARIMAX model's residuals: distribution and autocorrelation.

Table 1
Descriptive statistics for the dataset of air quality, transport and weather collected in 2007 within the study area.
a The orginal data on traffic were disaggregated by six vehicle classes: Motorcycle; Car or Light Van (length < 5.2 m); Car and trailer; Rigid Lorry, HeavyVan (length !5.2 m)

Table 3
Perfomance of the best MLP neural network.

Table 4
The SARIMAX model for the 2007 series of NO 2 hourly concentrations at Marylebone road.

Table 5
Descriptive statistics for the set of air quality, transport and weather data collected in 2006 within the study-area.In parentheses, the same statistics for the year 2007.

Table 6
Evaluation of the forecasting performance when the alarm threshold is set at 200 mg/m 3 .Evaluation of forecasting performance when the alarm threshold is set at 180 mg/m 3 .