Meta-modeling of a simulation chain for urban air quality

Urban air quality simulation is an important tool to understand the impacts of air pollution. However, the simulations are often computationally expensive, and require extensive data on pollutant sources. Data on road traffic pollution, often the predominant source, can be obtained through sparse measurements, or through simulation of traffic and emissions. Modeling chains combine the simulations of multiple models to provide the most accurate representation possible, however the need to solve multiple models for each simulation increases computational costs even more. In this paper we construct a meta-modeling chain for urban atmospheric pollution, from dynamic traffic modeling to air pollution modeling. Reduced basis methods (RBM) aim to compute a cheap and accurate approximation of a physical state using approximation spaces made of a suitable sample of solutions to the model. One of the keys of these techniques is the decomposition of the computational work into an expensive one-time offline stage and a low-cost parameter-dependent online stage. Traditional RBMs require modifying the assembly routines of the computational code, an intrusive procedure which may be impossible in cases of operational model codes. We propose a non-intrusive reduced order scheme, and study its application to a full chain of operational models. Reduced basis are constructed using principal component analysis (PCA), and the concentration fields are approximated as projections onto this reduced space. We use statistical emulation to approximate projection coefficients in a non-intrusive manner. We apply a multi-level meta-modeling technique to a chain using the dynamic traffic assignment model LADTA, the emissions database COPERT IV, and the urban dispersion-reaction air quality model SIRANE to a case study on the city of Clermont-Ferrand with over 45, 000 daily traffic observations, a 47, 000-link road network, a simulation domain covering 180km2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$180\,\text {km}^2$$\end{document}. We assess the results using hourly NO2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_2$$\end{document} concentration observations measured at stations in the agglomeration. Computational times are reduced from nearly 3 h per simulation to under 0.1 s, while maintaining accuracy comparable to the original models. The low cost of the meta-model chain and its non-intrusive character demonstrate the versatility of the method, and the utility for long-term or many-query air quality studies such as epidemiological inquiry or uncertainty quantification.


Introduction
Air quality simulations at urban scale are a key tool for the evaluation of population exposure to particulate matter and gaseous air pollutants. The simulations are however subject to costly computational requirements and complicated implementation. Studies in exposure estimation or uncertainty quantification, for example, require many solutions to the model. The 2016 study by the World Health Organization [1] on the global disease burden of air pollution excluded many pollutant species and health outcomes from the study due to lack of robust evidence. The use of advanced modeling methods in air pollution studies can provide precise estimations, however lower-cost but less precise models are often used in these scenarios due to high computational costs. Advanced models can be rendered feasible in this context if we can reduce the computational cost without significant loss of accuracy.
Let us consider a generic stationary model over a physical domain ⊂ R d , with d = 2 or 3, and parameter domain D ⊂ R N p The model output for a given parameter vector p ∈ D, c(p) ∈ R N , will be a largedimension vector representing the solution over a grid covering . M can represent various types of atmospheric pollution models, from highly complex formulations based on partial differential equations and fluid dynamics [2,3] to simpler, and more commonly operational, formulations such as Gaussian dispersion models. Even in the case of the (comparatively) simpler models, the computational time necessary for the solution of M in practical applications over large domains with many parameters (e.g., emissions sources) can be high. This would make numerous solutions to the model too costly in practice. Methods of model order reduction (MOR) can reduce computational costs without introducing significantly increased model error, and for a range of varying parameters p ∈ D.
Various MOR techniques have been studied in the context of air quality models (AQMs). In [4] the meta-modeling technique using statistical emulation by radial basis functions (RBF) was tested on pollutant concentration fields over Clermont-Ferrand approximated by the ADMS-Urban model [5] using daily profiles for traffic emissions. In [6], statistical emulation was used to evaluate the sensitivity of some input parameters on a global aerosol model. A Gaussian process emulation was used for the study of model uncertainty in [7] for accidental release scenarios. Gaussian process emulation was also used in [8] for the Sobol' sensitivity analysis of a dispersion model representing the Fukushima event.
In this paper, we will consider a modeling chain for air quality modeling over the agglomeration of Clermont-Ferrand and surrounding area in France. Air quality models are known to commit significant errors [2,[9][10][11], however these errors are strongly dependent on the calibration and inputs to the model. Providing more precise input data, such as data on pollutant emissions from road traffic, can greatly improve the accuracy of the modeled concentration field. The advantage of a modeling chain is the use of the best (most precise) information available on various inputs by using traffic and emissions models. In [9], the authors provide a review of modeling chain techniques for traffic pollutant emissions, atmospheric dispersion, and effects on water quality.
The modeling chain studied here consists of the dynamic traffic assignment model LADTA [12,13], an emissions model Pollemission [14] based on COPERT-IV emissions database [15], and an urban AQM, Sirane [16]. The computation of a pollutant concentration field over the agglomeration for any given time requires the solution of each model in the chain, which proves costly for long time periods.This brings us back to MOR techniques. However in this case, we have a chain of multiple models to reduce, which leads us to questions on the implementation of MOR techniques: whether to build a single reduction over the full chain, or a chain of meta-models? How can we treat the large parameter dimension of the chain? The use of modeled traffic emissions here presents additional difficulty in the construction of an air quality meta-model, due to the increased spatial and temporal variation of pollution emissions (compared to daily profiles or averaged emissions).
We resort to projection-based MOR techniques based on reduced basis (RB) [17] to construct cheap and accurate meta-models. A projection-based meta-model for the dynamic traffic model was built in [18]. Here we will complete the model chain with the conversion from traffic assignment and emissions model outputs on a coarse traffic network to pollutant dispersion model inputs on a fine traffic network. We then construct a meta-model for the AQM using statistical emulation by RBF interpolation with a weighted distance on the parameter domain to build a low-cost meta-model chain for the entire system. The motivation for this choice will be discussed in detail in "Case study on Clermont-Ferrand" section. An important aspect of the selected MOR method is its non-intrusive character. Among non-intrusive methods, various techniques are used to approximate the coefficients of a projection onto the reduced basis without relying on the equations of the original model. Refs. [19] and [20] present a two-grid non-intrusive method using a rapid low-fidelity numerical simulation followed by a post-processing step to aproximate the reduced basis solution from high-fidelity numerical simulation. This was applied to computational fluid dynamics and to a geotechnics problem with non-linear behavior, respectively. In [21], a non-intrusive reduced order data assimilation method was applied to particle dispersion in the case of sufficiently numerous measurement data, using a reduced basis of the model solution manifold and a second basis representing the available measurement data to correct model error. In [22], a regression mapping training inputs to the coefficients of the projected model output is approximated using an artificial neural network, and is tested on a one-dimensional unsteady combustion problem. In [23], a non-intrusive method is applied to stress tensor field reconstruction of a parametrized beam and pressure field reconstruction in computational fluid mechanics. This method also employs POD interpolation [24] to reconstruct reduced basis projection coefficients, and treats parameter domain reduction based on sensitivity analysis using a coupling with active subspaces, which can be useful in the case of problems presenting a low-dimension active subspace.
In "Meta-modeling methods" section, we will describe the meta-modeling technique based on RB methods. In "Case study on Clermont-Ferrand" section, we will describe the case study over Clermont-Ferrand: input and measurement data, computational domain, and selected models. In "Results" section, we will summarize the results of the metamodel on the AQM chain, studying accuracy, precision, and computational savings. The full meta-model chain will reduce computational costs to under 0.1 s per simulation while maintaining comparible accuracy, which will allow us to use the chain for high numbers of simulations in future work.

Meta-modeling methods
Computation times for large problems are commonly on the order of hours, making many-query contexts, such as sensitivity analysis and optimization, hardly feasible. Model reduction methods are of great interest to applications of parametrized problems involving many-query or real-time study. We will begin here by detailing the MOR method as applied to the AQM part of the chain, and we will discuss the details of the full meta-model chain in "Case study on Clermont-Ferrand" section.

Reduced basis method
We will rely on a projection-based method of model order reduction using a reduced basis. Let us consider a model, or model chain, M which takes input parameter vector p ∈ D ⊂ R N p and computes an output vector c(p) over a grid of N points. We will define the output solution set to the model X N = {c(p)|p ∈ D} ⊂ R N , where the parameter dimension is N p . Reduced basis methods exploit the parametrized structure of the model and construct a low-dimensional space approximating the solution set X N [25,26]. While the discrete model output is of high dimension N , the reduced order solution will be of dimension N N . A key factor of the reduced basis methods is the small Kolmogorov n-width [27]. The n-width measures to what extent X N can be approximated by an ndimensional subspace, and can be studied during the sampling of the solution space.
Our objective is to construct a reduced basis { AQ n } 1≤n≤N of N basis functions such that the projection of any simulated state, N c(p), onto the reduced basis is sufficiently precise. The basis representing atmospheric concentration fields will be denoted by AQ (air quality). To construct a RB, we first need to sample a large number of solutions in X N . This so-called training set should represent the variability in the solution states. We will sample the solution space by Latin Hypercube Sampling (LHS). Sampling by LHS was chosen for this study because many of the parameters are independent in practice. In addition this allows more flexibility when using the meta-model in the (quite realistic) case of uncertain parameters, or in rare scenarios such as pollution peaks, where a reliable meta-model is necessary but not guaranteed if it is trained over the most likely input values. Next we will construct the RB by principal component analysis (PCA).
We use LHS to select N train sample points (p 1 , . . . , p N train ) in the parameter domain D, and compute model simulations from each point to build the training ensemble Y AQ = [c(p 1 ), . . . , c(p N train )] to train the model reduction. As is common practice in PCA applications, we will first compute the ensemble meanc = 1  . This means that the error of projecting any member of the training ensemble onto the basis, Err N , will be bounded by the tolerance Err N ≤ N = √ 1 − I N [25]. The 98% tolerance cutoff is selected on a case-by-case basis: the goal is to keep N small and I N as close to 1 as possible. Here the 98% precision is attained relatively quickly, then improvement slows as N > 5 increases. For any new parameter, we can thus represent the solution as with projection coefficients α

Statistical emulation
Once we have constructed the reduced basis by PCA, we need a reduced order modeling scheme to approximated new solutions. Classical reduced basis methods which replace the approximation space with the reduced basis space are intrusive and require the modification of the computational code. We would like to use a non-intrusive method which can be applied to a black-box model or model chain, which is particularly pertinent in the context of operational models. The non-intrusive implementation allows the freedom of choice of the best available model. In particular, this allows the models to be updated with technological advances, and a model chain which is meta-modeled by linking non-intrusive meta-models maintains maximal versatility. It also makes for simpler implementation, as the calculation code does not need to be modified. While many MOR methods exist, the non-intrusive character of few of these methods is particularly advantageous in problems relying on operational models. We consider meta-modeling by the emulation of projection coefficients α AQ n , 1 ≤ n ≤ N .
First we select a linear trend, which will be a least squares regression R n (p) = We chose to compute this interpolation using RBF. We chose cubic RBFs φ and a weighted Euclidean distance d θ (·, ·) to represent the varying ranges of each input parameter.
We then define the emulated projection coefficients as follows. The weights {ω n,i } 1≤n≤N ;1≤i≤N train are chosen such that the interpolation is exact for the sample points The emulated solution is finallŷ The regression represents the relation between the model parameters and the RB projection coefficients, and computed from the training set (p i , α(p i )) 1≤i≤N train . This provides an initial trend to be corrected by the interpolation. In practice, the interpolation of the residual is the most important part of the emulation. The size of the training set N train plays an important role in the precision of this emulation, as the regression and interpolation are trained on this set. In [4], this method of approximating projection coefficients is compared to approximation by Kriging. The two meta-models showed similar results, and we chose RBF emulation for its simpler (and thus more accessible in operational applications) implementation and lower computational cost.

Case study on Clermont-Ferrand
In this work we will apply the meta-modeling method described in "Meta-modeling methods" section to a modeling chain over the city of Clermont-Ferrand in France. We will build a meta-model chain representing road traffic emissions and the dispersion and reaction of pollutants over the urban agglomeration and surrounding area using data over a 2-year period form 2013 to 2015. The model chain is represented in Fig. 1.

Traffic emissions modeling
Traffic emissions modeling is done using the dynamic traffic assignment model LADTA. A meta-model was constructed [28] to represent the traffic flow and speed simulations over a road network of 19, 628 oriented links, where nearly 45, 000 traffic flow observations are available each day. Emissions of NO x and PM are computed using Pollemission code [29] based on the COPERT-IV emissions database [30,31]. A detailed description of this section of the modeling chain and its input parameters can be found in [28]. The varying input parameters consist of 23 traffic parameters and 6 emissions parameters. These parameters are time-dependent or considered sources of uncertainty. They include temporal traffic demand, computed using traffic observations, the capacity and speed limits of traffic network links, multiplicative coefficients on origin-destination matrices representing the spatial distribution of traffic demand, traffic direction (morning versus evening), engine size, type, and emission standards of the vehicle fleet, and ratio of heavy-duty vehicles to personal cars.
The emissions model provides traffic emissions estimations for NO x and PM 1 0. However the atmospheric pollution model incorporates chemical reaction parametrizations which treat NO 2 , NO, PM 2.5 , and PM 1 0. In order to approximate emissions of NO 2 , NO, PM 2.5 , and PM 1 0, we would like to estimate what proportion of NO x consists of NO, and what proportion of PM 1 0 is PM 2.5 . In the deterministic case, we set the ratio NO 2 NOx = 0.15 [32][33][34], and the ratio PM 2.5 PM 10 = 0.75 [35,36]. In order to construct a meta-model which can account for varied or uncertain speciation ratios, we will draw LHS parameters for the training ensemble in the intervals ( The output of the traffic-emissions coupling is the emissions on each link of the traffic network in g/15 min.

Air quality modeling
Air quality modeling is done using the urban dispersion-reaction model Sirane [16,37] over a simulation domain of 180 km 2 . Sirane is used as a static model which approximates the solution at a given time of the transport-reaction equations satisfied by the pollutant concentrations. The traffic emissions over a relatively coarse road network are converted to g/s/link on a finer network representing over 47, 000 line sources. For the calculation of NO 2 concentrations, we provide the so-called background concentrations of pollutant species involved directly or indirectly in the formation of NO 2 . The background concentrations, provided for NO 2 , PM 1 0, and O 3 , represent the imported concentrations of pollutants, that is, concentrations transported from other locations to the city, and from the dispersion or reaction of previous emissions in the case of stationary solution. We will provide line emissions inputs on NO 2 , NO, PM 2.5 , and PM 1 0. Input data on meteorological conditions (wind velocity, cloud coverage, a precipitation parameter, and temperature) and surface emissions sources are also provided. The AQM output is the NO 2 concentration over a grid at ground level, at 20 m resolution. Hourly concentration observations are available over 2 years at 5 stations, or around 90, 000 NO 2 observations for analysis of model simulation outputs.

Modeling chain
The modeling chain consists of these three steps-traffic modeling, emissions calculation, and dispersion-reaction modeling-and the conversions between outputs and inputs. In Fig. 2 we can see the traffic flow (veh/h/link) and associated emissions (g km −1 s −1 ), and NO 2 concentration (µg m 3 ) simulations at 8 a.m. on a Tuesday in November 2014, provided by the traffic meta-model and full air quality model. The task remains to reduce the computational time required to obtain concentration fields by constructing a metamodel for the entire chain.

Surrogate modeling chain construction
As noted above, the traffic emissions on a geographically finer road network provided as input to the air quality model represent over 47, 000 line sources. In the context of model order reduction, this represents as many parameters, which in the practice of projection-based reduction methods makes the identification of the projection coefficients For our case study, we chose N lin = 11 to represent 95% of the variability of the emissions solutions. This corresponds to a relative projection error tolerance over the training samples of 2 lin = 0.05. In the model chain, the over 47, 000 line source parameters will henceforth be replaced by the N lin = 11 projection coefficients {α lin n } n≤N lin , and the traffic emissions field for a given parameter approximated by its projection N lin E(p traffic , p e ) onto the traffic emissions RB. We perform the same reduction over the hourly surface emissions with N surf = 1 and projection coefficient α surf . In Fig. 3, we can see the largest singular values of the PCA step, and the relative mean projection errors of the training traffic emissions simulations onto the RB { E n } 1≤n≤N lin , as defined by In Fig. 4 we can see the first 4 principal components of the traffic emissions RB.

Construction of the air quality meta-model
We now can write the reduced concentration model parameters p T c = (α T lin , α T surf , p T AQ ). We will construct a meta-model of the air quality model to complete the meta-modeling chain, with reduced full parameters as described in Table 1. The choice to build a separate air quality meta-model to complete the chain of meta-models (as opposed to a meta-model of the chain) was to allow multi-level assessment using traffic flow and air quality measurement data (a possibility particularly pertinent in a study of uncertainty quantcification), by a meta-modeling method which can be generalized in the case of additional models in the chain (such as an economic or  epidemiological model). In addition, if a single meta-model represents the full chain, the training set must be at least as large as the largest training set in the chain, which could increase offline computational time if one model requires a larger training set than others.
Here, N traffic train = 3003 and N AQ train = 9347. When constructing the training ensemble for the air quality meta-model, we chose to draw LHS parameters for the full modeling chain p full . This choice lead to reduced variations in the emissions projection coefficients {α lin n } 1≤n≤N lin versus LHS selection over uniform distributions of the emissions projection coefficients α lin ∈ [α lin min , α lin max ] N lin . The projection coefficients are in practice not independent; a strong first coefficient is often associated to a weaker second or third coefficient, as these principal components tend to represent different spatial distributions of the emissions. This means that the entire space [α lin min , α lin max ] N lin represents significantly more variation in the state E(p traffic , p e ) than the traffic-emissions model produces. By performing LHS over the full chain parameters p full = (p traffic , p e , p AQ ) ∈ R 41 , the emissions projection coefficients are computed during the conversion of traffic meta-model outputs to concentration meta-model inputs.
In Fig. 5 we compare the parameters α lin n selected by these two methods by plotting the parameter spaces (α lin 1 , α lin 2 ) and (α lin 1 , α lin 4 ). We can see that the parameter spaces in red, which correspond to performing LHS on p full and computing the projection coeffi- cients α lin n of the traffic emissions model output E(p traffic , p e ) represents significantly less variation than LHS selection directly on the parameters α lin n . This tactic avoids building a meta-model unnecessarily representing additional variation of the state by only considering realistic traffic emissions. In Table 2, we set the ranges of each input parameter which defines the parameter space D.
We use LHS to select a training set of N train = 9347 concentration fields. Due to the large input parameter vector (N p = 41), we used a LHS algorithm for 10, 000 training samples, and removed the concentration fields with numerical instability (this can be attributed to modeling error, which should not be confused with error in the meta-model). We use the NO 2 concentration fields c(p full ) to construct a reduced basis { . We set the RB dimension N = 5 to represent 98% of this variability.
In Fig. 6 we see the first 4 principal components of the concentration RB. We can see that the first basis function represents urban background concentration in the denser urban areas. The second seems to represent additional pcollution from traffic. The third appears to represent situations with strong wind from the east, while the fourth shows the influence of wind from the north.
For any new parameter value, the concentration field can be approximated by the orthogonal projection onto the RB, for projection coefficients {α AQ n } 1≤n≤N , Finally we use the statistical emulation method described in "Statistical emulation" section to construct an emulator of the concentration projection coefficients α AQ n . The full chain can be computed with a single code which applies the traffic-emissions meta-model, the calculation of emissions RB projection coefficients, and the atmospheric pollutant meta-model. This meta-model chain provides outputs on traffic flow, speed, and traffic emissions over the road network, and NO 2 concentrations over a 20 m-resolution grid.

Results
In this section, we will summarize the results of the method described in "Meta-modeling methods" section to the case study in "Case study on Clermont-Ferrand" section using data over the month of November 2014. Traffic flow measurement data serves as inputs to the model chain for deterministic simulation, and data on pollutant concentration serves to study model and meta-model performance. We will compare the meta-model output to simulations from the full model Sirane, as well as to concentration observation data, and we will assess computational savings.

Meta-model performance
We introduce the following statistical scores commonly used for evaluation of models [4]: the normalized mean square error (NMSE), the normalized root mean square error (NRMSE), and the correlation. We define here the output functionals o : R N → R associated to each of the concentration sensors o, such that the observation data y obs o (p(t)) = o (c true (t)). We denote by c true (t) the unknown true concentration field at time t, and p(t) the estimated parameters at time t.
For a data set of M ≤ N time N obs measurements (some measurements may be unavailable in practice) over N time times and N obs sensors, we use the index m, 1 ≤ m ≤ M. c m = o (c(p(t))) is the value of the output functional associated to sensor o applied to the simulated state estimate at time t indexed by m. We use the same notation here where the simulated state is the full model output c(p) or the meta-model outputĉ(p). M is the total number of data available, and y obs m is the mth data point.c andȳ obs are respectively the mean of (c m ) 1≤m≤M and (y obs m ) 1≤m≤M .
Finally we define the NRMSE as RMSĒ y obs , and the mean normalized root mean square error (MNRMSE) as the mean over all sensors (or grid points) of the NRMSE calculated over the concentration c i at each sensor (or grid point) over the month.

Comparison with the full model chain
We first analyze the precision of the meta-modeled concentration fields as compared to the full model Sirane. This will help us understand the ability of the meta-model to reproduce the concentration state and quantify the loss of precision caused the the dimensional reduction. In Fig. 7 we see the concentration fields of NO 2 simulated by the full model and the meta-model chains, as well as the sensor locations for concentration measurements. The parameters p correspond to conditions on Tuesday November 18, 2014 at 8 a.m. We see very similar approximations near the highways east of the city center, however the metamodel does not perfectly reproduce the variation between heavy-traffic areas and low-traffic areas. Overall the reduced order simulation is a good representation of the full model. In Fig. 8, we see statistical scores spatially mapped over the meta-model domain compared to both the projected solution and the full model solution. The scores of the reducedorder solution compared to the projected solution give insight into how well the RBF method of approximate the projection coefficients in order to reproduce the projected solution. The NRMSE shows that the emulated solutions perform well in approximating the urban background concentration levels, but do not capture the highest concentrations along the large highways, where we will see the highest bias levels. The correlation map also shows low correlation between the meta-model and full model only along the roadways, where the dimensional reduction has failed to capture the extent of the increased concentrations due to traffic emissions. Finally the bias map shows that the meta-model generally predicts higher concentrations in the denser urban areas when compared to the full model, again matching the trend of the dimensional reduction reducing the sensitivity of the meta-model to sharp spatial variations in concentrations. However, the areas with poor scores remain limited, and we will also consider the significant error that will inevitably be committed by the full model in the next section.
In Fig. 9 we see the relative errors of the full model concentration projected onto the reduced basis {ψ AQ n } 1≤≤N , averaged over the set of deterministic simulations for the month of November 2014. We also see the emulated concentration relative error, averaged  over the same set of simulations. While the emulation of the projection coefficients is globally responsible for a significant portion of the error, we can see that the regions with the highest projection error correspond to high errors in the meta-model as well. This is expected, as the emulated solution can only perform as well as the projected solution. We see that larger errors are located on roads, mostly the large highway and outside the dense urban area. Meta-model error remains below 20% over a large portion of the domain, which shows that much of the spatial variation of the concentration is captured by the reduced order solution.
In Table 3 we can see statistical scores comparing the meta-modeled concentration to the full concentration model over all hours of November 2014. We compare both the entire grid (here c m is the concentration at a grid point and M = N grid is the total number of grid points) and at the NO 2 sensor locations. While the dimensional reduction means the meta-model does not fully capture spatial variations of the simulated concentration state, we can see that the relative RMSE errors are satisfactorily low, and the correlation between the two is very high.
In Table 4 we can see these scores when a reduced basis and metamodel are trained using a subset of 3000 members of the training set only. We can see the necessity of the larger training set for the air quality model. Here the difference in scores with respect to Table 3 is caused by the smaller training set of the emulation, rather than by a less precise reduced basis.
In Fig. 10, we see a visual representation of hourly scores of the meta-model solution compared to the full solution at each grid point for simulations corresponding to the month of November 2014. The NMSE (11) remains below 0.4 for most parameters, and the RMSE (10) often below 10 µg m −3 . Correlations scores are grouped above 0.75, and the bias distribution is nearly centered around −2 µg m −3 , showing a slightly higher concentration approximation by the meta-model, when averaged over the grid.

Comparison with observational data
We next analyze the accuracy of the full model and meta-model compared to observational data on NO 2 concentrations. Sensor locations can be seen in Fig. 7. In Fig. 11, we see We compare observed, emulated, projected and Sirane modeled concentrations of all weekdays in November 2014. We see that the bias in the modeled concentrations underestimating peak concentrations, notably during heavy traffic periods in the mornings and evenings. We also notice a seemingly delayed reaction of the model chain to the pollution increase during the evening peak hour. In [28], this delay was less evident, suggesting that factors such as the dispersion and reaction parametrizations in the AQ model or the aver-  aging of time scales from 15 min to 1 h may have an effect. The exploration of this question will require more study of uncertainties in the model chain. We notice that the temporal trend representing morning and evening peak hours in traffic is reproduced by the model chain. We also note that the emulated concentrations are closer to the observations than the full model. This is likely due to the "smoothing" effect of the dimensional reduction causing less sharp concentration variations, as small parts of the modeled concentration fields are not reproduced by the reduced basis.
In Table 5, we compute statistical scores over the month of November 2014, comparing the full model simulations and the meta-modeled simulations to the observation data at M = 4 sensor locations. We again see that the emulated solutions are slightly more accurate than the full model. The stations at which both the model and meta-model perform best are those found in dense urban areas, excepting the station Gare, where heavy traffic induces high NO 2 concentrations, which the model fails to reproduce. We see the highest bias at this location. Finally, the station Chamalières is located outside the city center, where the model exhibits a higher level of bias. The performance of the meta-model with respect to observation data is highly satisfactory.
In Fig. 12, we see a visual representation of daily scores of the meta-model solution and the full solution compared to NO 2 observations over the month of November 2014. The meta-model shows similar score distributions to the full model, excepting the occasional While we have seen that the model reduction by statistical emulation causes loss of precision, and the meta-model simulations contain error with respect to the full model, comparing to observation data suggests that this error is not significant with respect to the model error inherent to operational models for urban air quality, and does not reduce the accuracy of the predicted concentrations at sensor locations.

Computational savings
We have seen that the meta-model chain produces satisfactory results when compared to observational data, and determined that the loss of precision due to the dimensional reduction is not higher than the error committed by the full model. Now we will show the computational savings afforded by the meta-model chain. In Table 6, we can see the computational times required for a single simulation of the chain by the meta-models or the full models. The meta-models depend on three reduced bases, representing traffic for the traffic assignment meta-model, road emissions for the reduction of pollution model input dimension, and concentration fields for the pollution meta-model. The initialization of the meta-model chain requires loading these bases and building the RBF emulators. Once the chain is initialized, it can be run for any number of simulations at very low cost, under 0.1 s for a simulation representing a 1-h period. In comparison, the full model chain requires nearly 3 h for a single simulation. The offline construction of the meta-models required 6000 traffic model simulations [28] and 10, 000 pollution model simulations, which represents a significant computational investment. However, these meta-models are trained over training points {p i } 1≤i≤N train ∈ D representing 2 years of data, and once constructed are useful for study over multiple years. In the absence of high performance computing machines or clusters, the simulations can be run using a pseudo-parallel technique running one simulation per core on desktop calculation machines. The Sirane simulations described in "Case study on Clermont-Ferrand" section took around one day using this method on multiple machines of 64 GB RAM or less. Once the meta-model chain is constructed, the online phase for the simulation given any parameter p ∈ D is very cheap, which makes real-time or manyquery contexts possible, for example for use in uncertainty quantification study.

Conclusions
In this work we constructed a meta-model chain by statistical emulation of reduced basis projection coefficients for urban air quality modeling over the agglomeration of Clermont-Ferrand. We used the road traffic meta-model constructed in [28], built a reduced basis representing road traffic emissions, and constructed a second meta-model of NO 2 concentration fields over the agglomeration, substituting thus a low-cost chain of meta-models for a computationally costly modeling chain over a large urban area. This required the selection of a spatially finer road network for the AQM emissions inputs, the dimensional reduction of the inputs to the atmospheric pollution model, the treatment of traffic observation data to compute model input parameters, and the appropriate sampling of the parameter spaces to construct a reduced basis and reduced order modeling scheme. We chose the method of a meta-model chain, and restricted the variations in the AQM input parameters with respect to a standard LHS method without under-representing the solution space.
For each simulation of an hourly concentration field, we reduced computation time from over two computational hours to under 0.1 s. Results show good precision of the meta-model simulations with respect to the full model chain, and similar accuracy when compared to measurement data. We saw that a portion of the error between the meta-model and full-model chains can be attributed to model error, and the reduced order model does not show significantly increased error. The meta-model can be used in applications requiring numerous solutions to the model chain, rendering various otherwise impractical studies, for example exposure analysis, computationally feasible. This model reduction makes the model chain useful in a wide variety of applications.
Here we constructed a chain of meta-models as opposed to a single meta-model of the full modeling chain. This was done in order to make use of traffic and emissions simulations and data, and for the versatility of a chain of meta-models in our applications. A comparison of the precision, parameter sensitivity, as well as the stability of the metamodel formulation (inversion of matrices) of each method, would make for an interesting follow-up study. In future work, we will use this low-cost meta-modeling chain in the study of uncertainty quantification and the propagation of uncertainties throughout the model chain.