A data science approach for spatiotemporal modelling of low and resident air pollution in Madrid (Spain)_ Implications for epidemiological studies

Model developments to assess different air pollution exposures within cities are still a key challenge in environmental epidemiology. Background air pollution is a long-term resident and low-level concentration pollution difficult to quantify, and to which population is chronically exposed. In this study, hourly time series of four key air pollutants were analysed using Hidden Markov Models to estimate the exposure to background pollution in Madrid, from 2001 to 2017. Using these estimates, its spatial distribution was later analysed after combining the interpolation results of ordinary kriging and inverse distance weighting. The ratio of ambient to background pollution differs according to the pollutant studied but is estimated to be on average about six to one. This methodology is proposed not only to describe the temporal and spatial variability of this complex exposure, but also to be used as input in new modelling approaches of air pollution in urban areas.


Introduction
Air pollution is a major environmental concern in urban areas worldwide, with significant impacts on societal health and economy (WHO, 2016). There is a growing evidence of mortality and morbidity effects related to long-term exposure to ambient air pollution (Cheng et al., 2017;Lao et al., 2018;Lee, Kim, & Lee, 2014;Weinmayer et al., 2015). Moreover, health outcomes have been seen at very low levels of exposure (Lepeule et al., 2014), and it is unclear whether a threshold concentration exists below which no effects on health are likely. The association among low and long-term air pollution with human health outcomes has not been firmly established and additional insights are needed to collectively strengthening epidemiological evidences. Identify exposures that contribute to health outcomes and construction of exposure summary measures are questions of interest in environmental epidemiology (Weisskopf, Seals, & Webster, 2018), and represent the main motivation of this work.
Background concentration has been defined as the concentration that is not affected by local sources of pollution (Menichini, Iacovella, Monfredini, & Turrio-Baldassarri, 2007;WHO, 1980). In urban areas, the background concentration levels are typically measured at air pollution monitoring sites far from local sources of pollution (background sites), and these concentrations are considered be the sum of contributions from regional and urban background emissions (Gao et al., 2018). Typically, these background levels are studied: (i) to better understand the contributions of local sources to total pollutant concentrations; and (ii) to allow the assessment of new pollutant sources that are introduced into the area of study and their impact on local air quality. In this work, this definition of background concentration is extended and it is considered to be influenced by local contributions (e.g., traffic hot-spots). Thus, it is possible to assess a more realistic long-term air pollution exposure of low concentration to which the population is chronically exposed. The importance of its study resides in representing at study areas a range of minimum and stable concentrations of ambient air pollution, which is permanently resident in the long run.
Opportunities for exploring novel exposures parameters that have been previously difficult to quantify is a key challenge in environmental epidemiology (Tonne et al., 2017). This study aims to provide a reliable estimate of background (low) and long-term air pollution focusing on intra-urban scales, and therefore, to contribute with new input information to epidemiological studies regarding the association of air pollution with human health effects in cities. Specific objectives are: (i) to characterize quantitatively the background air pollution (NO 2 , O 3 , PM 10 and SO 2 ) at temporal and spatial scales in Madrid urban area during the period from 2001 to 2017; and (ii) to standardize a robust methodology to estimate a chronic exposure measurement to these air pollutants (or others) and possibly extended to other types of pollution (e.g., noise or odour pollution).

Area of study
Madrid is the third most populous city in the European Union after London and Berlin and the largest city of Spain, with an estimated population of 3.1 million people in the city and 7.3 including the metropolitan area (INE, 2017;Madrid City Council, 2017). Madrid's economy is based on the service, construction and industry sectors. Its location in the centre of the country also makes Madrid the main transport knot within the Iberian Peninsula (Cuevas et al., 2014) with road traffic the main source of PM 10 and NO 2 emissions (Montero & Fernández-Avilés, 2018). Quantitatively, 48% of PM 10 has been proved to be contributed by vehicle emissions (Salvador, Artíñano, Alonso, Querol, & Alastuey, 2004), and NO 2 and CO are related to traffic in more than 80% (Monzón & Guerrero, 2004). The SO 2 levels, mainly produced by energy production and distribution activities, and to a lesser extent by the commercial, institutional and households sector, have experienced a decreasing due both to the reduction of residential coal burning and the use of gasoline vehicles, but also by the implementation of particles filters in diesel engines (Salvador, Artíñano, Viana, Alastuey, & Querol, 2012). As in many urban environments, O 3 is photochemically produced (secondary air pollutant) under specific conditions or transported from other regions, presenting higher levels at city outskirts (due to lower levels of nitrogen oxides). In particular in Madrid, 65% of tropospheric O 3 formation is accounted for traffic-related precursors (Valverde, Pay, & Baldasano, 2016). European Commission limits (Directive 2008/50/EC) and WHO guidelines (WHO, 2005) values are currently being complied in Madrid concerning particulate matter, but not for NO 2 (MAPAM, 2017) with high pollution episodes recently studied (Borge et al., 2018). Although NO 2 and PM 10 ambient air concentrations have shown a clear decreasing trend during the last years due to the emission reductions in the road traffic sector (economic recession from 2008), use of adoptions of eco-friendly fuels (Euro 4 and Euro 5 emission standards in vehicles) and emission control policies, this urban area has experienced an increase of ambient air O 3 levels (30-40%, Saiz-López et al., 2017), as well as in other European cities. Unfortunately, air quality in Madrid is still an issue of remarkable concern and therefore motivated to be the focus of this study.

Data
The air quality monitoring network (AQMN) of Madrid included 24 operating sites in 2017 and is managed by the Madrid City Council, which ensures its correct maintenance and validation of monitored data. These sites are classified according to their location (U-urban, S-Suburban) and main pollution source (T-traffic, B-Background). Location and typology of studied sites are provided in Supplementary Material (SM. -1). Analysed data in this study were hourly time series (TS) for each year from 2001 to 2017, of NO 2 , O 3 , PM 10 and SO 2 obtained from 38 monitoring stations ( Fig. 1) when available.
Validated hourly data (in μg·m −3 ) were obtained from the Madrid City Council's Open Data portal (https://datos.madrid.es). Every TS for a given year and air pollutant was studied only if two criteria were met: (i) at least 80% of hourly values were available during the year (minimum of 7008 hourly values); and (ii) at least 11 months should present the mentioned minimum monitoring efficiency (minimum of 576 hourly values). The length of the studied TS differentiated by monitoring site, year and air pollutant is provided in SM.2.

Background pollution estimation
Air pollution levels at urban regions depend on the atmospheric phenomena that occur at different spatial scales, from international scales to street levels. Moreover, the choice of the model is dependent on the purpose of the simulation (Borge et al., 2014). In this study, the background air pollution concentration was estimated independently on annual TS of NO 2 , O 3 , PM 10 and SO 2 pollutants at hourly resolution and summarized as an annual average concentration, using Hidden Markov Models (HMM). These models allow for estimating the background ambient pollution, which represents a long time exposure to air pollution experienced by the population. The methodology for this estimation has been previously described by Gómez-Losada, Pires, and Pino-Mejías (2016) and proved to be a convenient approach for that purpose in Gómez-Losada, Pires, and Pino-Mejías (2018). In the interest of space, the elements and a mathematical description of HMM are provided in SM.3. This study represents an application of such methodology to Madrid' urban and metropolitan areas and succinctly explained next.
In this study, the goal of HMM is to obtain groups of hourly observations of air pollutants in each annual TS, forming different clusters. These clusters group similar hourly concentration values, which are simultaneously dissimilar to the rest of hourly concentrations grouped in other clusters detected in TS. There are multiple techniques to identify cluster in data. However, the main difference of HMM with the rest of these techniques is that HMM is especially devoted to deal with dependent (TS) data (Zucchini & MacDonald, 2009). Hence, hourly TS observations forming clusters in each TS are assumed to represent profiles (regimes) of pollution. The more suitable number of clusters detected in each TS is estimated according to the Bayesian information criteria (BIC) value. Each of these pollution profiles may be summarized by its average value, which are calculated as the average values of the hourly observations grouped within each cluster. Therefore, the cluster with lowest average value can be assumed to represent the annual background average concentration of the studied air pollutant, and is the one of interest in this work. Likewise, without applying a clustering to the hourly TS observations, the annual average concentration of all the TS observations provides the average ambient pollution. Fig. 2 illustrates the results (in red) after applying the HMM methodology to TS data for estimating the background pollution. The computational implementation of HMM was performed using the depmixS4 package (Visser & Speekenbrink, 2010) in R software (R Core Team, 2017) and an example script for HMM implementation is provided in SM.4.

Spatial analysis of background air pollution
After applying independently HMM to each analysed TS (from every available air pollutant, at monitoring sites and by years), the estimated average background air pollution concentrations at sites were used to map the geographical variation of background air pollution over Madrid. According to Li and Heap (2014), spatial interpolation methods fall into three categories: (i) non-geostatistical, (ii) geostatistical; and (iii) combined methods of the previous ones. The selection of an appropriate interpolation method for a given input data set is still a key issue on which little guidance exists. In interpolation methods commonly used in environmental studies, important factors affecting the quality of the estimates are the sampling density and the clustering and spatial distribution of samples, with possible interaction among these factors (Tadic, Ilic, & Biraud, 2015). Therefore, to minimize the limitations of each interpolation method, combined methods have been recently developed to produce the spatial estimates Li, Heap, Potter, & Daniell, 2011). To that end, in this study, the spatial distribution of the background air pollution was estimated after averaging the estimates of one non-geostatistical (inverse distance weighting -IDW-) and one geostatistical (ordinary kriging -OK-) methods. These well-known methods are next briefly described. A detailed comparison of IDW and OK methods can be found in Wong, Yuan, and Perlin (2004) and other basic geostatistical documents.
IDW and OK are interpolation methods widely used to estimate spatial distribution of air pollution and are representative of deterministic and stochastic interpolation methods, respectively. In both, the estimated air pollution concentration at unsampled locations are computed as a weighted average, given the concentrations at a set of neighbouring sampled values, and a weight assigned to each of the neighbouring values, with all the weights summing to one. In IDW, the weights are arbitrarily determined (deterministically) using a predefined mathematical expression. In OK, they are obtained from the sample data based on a variogram. A variogram expresses the degree of similarity between two sampled observations separated by a given distance (lag).
The interpolation weights in IDW are computed as a function of the inverse distance between observed sample sites and the site at which the prediction has to be made. IDW assumes that each measured point has a local influence that diminishes with distance. The influence of the distance can be controlled by an exponent (p) in such a way the lower the exponent, the more uniformly all neighbour values are incorporated into the interpolation. If p = 0, the weights do not decrease with the distance and the estimated values at unsampled locations are equal to the mean of all the measured values; the value p = 2 is typically set by default in most applications, meaning that the importance of each measured location in determining a predicted value diminishes as a function of squared distance.
OK has been previously used with success to model both O 3 and PM 10 at the local scale, and to model broader scale variations in the background air pollution (Beelen et al., 2009). In particular, the OK is applied when the level of a pollutant does not exhibit a marked drift over the area under study (Jerret et al., 2005), as in the Madrid's case (results not shown). In OK interpolation, the function determining the weights is called a variogram model. This model is a function fitted to the (empirical) variogram, which in turn describes the spatial autocorrelation structure of the observed pattern. The choice of this model may play a significant role in the resultant spatial estimations. A remarkable difference between IDW and OK is that the first yields estimates that are always within the range of the observed values at sampled locations.
The computational implementation of the IDW and OK was performed using the gstat package (Gräler, Pebesma, & Heuvelink, 2016) from R software, after geo-referencing the monitoring sites in the WGS84 coordinate reference system. To estimate the optimal value for p in IDW, a cross-validation procedure was performed using values for p from 1 to 5 to build models and tested on held-outs fractions of the data (2:3 ratio for building IDW models with different p, and 1:3 for testing). The best p value was selected according the lowest root mean squared To the histogram of TS data (grey line) is superimposed its density (thick black line) and the density of each cluster detected by HMM (the density of the background -bg-regime is shown in thick, red line). Box whiskers plots represent the range of concentration for each cluster detected by HMM (background pollution in red). In both figures, the arrows represented the average value of the background (red arrow) and ambient pollution (black arrow), respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) error obtained in the testing fractions. In OK, the optimal variogram model was determined using the autofitVariogram function from the automap package (Hiemstra, Pebesma, Twenhofel, & Heuvelink, 2009) and later this result used as an input in the krige function (gstat package). Each of the pollution maps produced for each year has a 343.3 km 2 surface (16.8 km east-west x 20.5 km north-south) and are represented by 437 grid cells (23 cells × 19 cells), each of them covering approximately an area of 0.8 km 2 . A general overview of the relationship of background levels of studied air pollutants and its temporal and spatial trends were later obtained by means of multivariate analysis (Principal Components Analysis -PCA-). PCA was performed using the dudi.pca function from ade4 package (Dray & Dufour, 2007) in R.

Results and discussion
3.1. Evolution of the ambient and background pollution: 2001-2017 Fig. 3 shows the evolution of average background pollution estimated at monitoring sites illustrated as coloured box-whisker plots, and compared to ambient pollution as a reference (white box-whisker plots). Trends are indicated with a joining line through the median of each box-whisker plot. As it can be seen in this figure, the quantitative difference between the background and ambient pollution differs according to the studied air pollutant. Remarkably, this difference remains practically constant for NO 2 and with few differences for PM 10 and SO 2 . Regarding O 3 , this difference makes clearer from 2009 onwards with a downward trend of the background pollution drawing a distinction with the ambient pollution. The quantitative relation between the background and ambient air pollution (ratio) is numerically provided in Table 1. The increase and decrease of this ratio between the ambient and background pollution concentrations for NO 2 , PM 10 and SO 2 practically remains constant for the 17 years period. With regard NO 2 and PM 10 , this ratio estimates that background air pollution is on average about seven times lower (6.9 units) than ambient pollution and two times lower (2.3 units) for SO 2 . For O 3 , it can be distinguished two epochs (2001-2009 and 2010-2017) with the ratio increasing from 8.5 to 11.6, respectively. Considering all the air pollutants, this ratio is estimated in 6.2 units.
Except for O 3 , the studied background pollution trends mimic the one from the ambient pollution, suggesting that the prevalence of the former could be likewise affected by meteorological and physical factors that influence the levels of ambient pollution in Madrid. The contributions from non-local (regional) sources to the estimated levels of background pollution is likely to be present, although their study would require a more detailed investigation. The abrupt decrease of background O 3 from 2009 to 2010 indicates that a lag of one year is exhibited with respect the beginning of the emission cut downs of O 3 precursors due to economic recession (2008). The median location within the box-whisker plots for all the pollutants is irregularly placed across years, indicating the departure of the normal statistical distribution of background concentrations on Madrid sites. In the PM 10 case, from 2008 onwards the length of the box-whisker plot indicates that the PM 10 pollution is approximately similar for most of the monitoring sites studied.
Exploration of new threshold values of air pollutants below which no damage to health is observed have been set as a priority by World Health Organization (2016; WHO, 2003). To this regard, the presented levels of background pollution and its spatial analysis (provided in next section) are proposed as suitable concentrations levels for investigating their possible association with health outcomes detected in Madrid. Complementarily, one of the uses of these estimates can be also their inclusion as inputs or covariates in environmental epidemiological studies dealing with health outcomes.

Spatial analysis of background pollution
The choice of spatial unit analysis has important implications for results of epidemiological studies (Fecht et al., 2016). To better understand the possible adverse health effects associated with exposure to background air pollution, maps given in Figs. 4 and 5 for NO 2 and O 3 , respectively, provide sufficient detail to investigate such associations by Madrid's geographical zones. The same consideration is valid for PM 10 and SO 2 maps (Figs. SM.2 and SM.3, respectively). The spatial dynamic of background pollution for all the studied pollutants is difficult to assess. Mainly, this is due to the urban heat island effect present in Madrid (Salamanca, Martilli, & Yagüe, 2012;Yagüe, Zurita, & Martínez, 1991) that affects not only the dynamics of pollutants beyond the meteorological and physical factors, but also the regional contributions to the background pollution originated in adjacent municipalities from the Madrid metropolitan area. These contributions are strongly dominated by the road traffic sector (Borge et al., 2014). In these maps, two distinct pollution nuclei can be differentiated, namely, the Madrid's urban core delimited by the M-30 ring road (inner position, Fig. 1) and the Adolfo Suarez Madrid-Barajas airport area (site 27, Fig. 1). The 24 and 58 sites (left side, Fig. 1) correspond to suburban monitoring sites, and in particular the first one, with the largest public park in Madrid (Casa de Campo). It is worth to note that the quantitative variations in background pollution levels (range of concentration values) are lower than in the ambient pollution case (Fig. 3). In Fig. 4   consistent with higher levels of O 3 ambient pollution at city outskirts. This can be appreciated during most of the years except higher levels of background pollution at specific monitoring sites from the urban core (e.g., 2002 and 2005). However, the association between NO 2 and O 3 background levels cannot be easily established, probably due to the low concentration levels of the background fraction of both pollutants. PM 10 background concentrations may serve as a proxy for traffic pollution, as reflected in higher concentrations estimated at traffic hotspot sites during 2007, and 2011 to 2014. However, low levels of PM 10 are approximately constants during the studied period. Typical contributions in Madrid for PM 10 background pollution could also be explained by dust outbreaks from Sahara desert.
SO 2 background concentrations are influenced, primarily by the industrial sector (including thermoelectric stations) and secondarily by traffic. Higher concentrations were estimated in 2001, 2005, and 2009. However, the SO 2 concentrations have been decreasing due to the policies and strategic measures applied, such as the burying of the M-30 road and the changing trend of power generation.

Multivariate analysis using the obtained estimates
In this section, it is presented a general overview of the multivariate relationships among pollutants, at the different studied sites and dates. PCA was performed over a dataset containing the background concentration estimates obtained in previous sections conveniently identified in time and location along the whole period of study. Fig. 6 illustrates the resulting factorial map, in which individual observations are projected (grey points) over the two first principal components resulting from PCA (PC1 and PC2), projection of numerical variables (red arrows) is overlapped, and projection of years (blue) is given as well. As usual in PCA, the angle between numerical variables projection's and principal components is related with the correlations between them and the length of projection with the importance in the first factorial axis. Also, projection of centroids with years allows to understand temporal trends. The PCA shows that NO 2 and PM 10 (in red) appear with a weak positive association (they project on the same direction with small angle between them), meaning that PM 10 tends to increase together with NO 2 , as well as O 3 tends to grow together with SO 2 . The georeference of each observation is represented in variables x and y, indicating the longitude and latitude where the measurements were taken. As longitude (red arrow labeled with x) projects in the same direction of the second factorial axis (PC2), observations in I and II quadrants (upper part of the figure) tend to be in the Eastern part of the city (longitude increases to the East), whereas III and IV quadrants are in the Western part. The figure indicates that the Eastern part of the city has typically higher O 3 and SO 2 concentrations. The former one accumulates in city outskirts as a consequence of traffic. The later comes mainly from power generation and is frequent in industrial areas, like those present in the metropolitan areas of big cities. In the Madrid's metropolitan area, NO 2 and PM 10 levels, mainly produced by traffic congestions tend to be lower, as opposed to Madrid's city center where traffic congestion is intense and register higher values.
By projecting the years on top of this factorial map, it is seen that, in general, background air pollutant concentrations decrease along time (as years increase towards the right hand side of the map whereas the variables representing pollutants projects towards the left hand side of the map). It is also remarkable that between 2010 and 2011 there is a significant change in pollution. This is aligned with the requirement that European Commission sent to Spain on November 24, 2010 to activate measures to comply with the air quality standards from the Air Quality Directive 2008/50/CE, that caused the elaboration of the Spanish Royal Decree 102/2011, regarding improvements in quality of air. Fig. 6 shows that the policies activated by 2011 effectively reduced the background pollution levels of the studied pollutants in Madrid.

Daily patterns of background pollution
Existing studies have shown evidences of daily variation in exposure to ambient PM 10 , NO 2 , and O 3 , to be linked to acute pulmonary and cardiovascular outcomes. Moreover, levels considered generally safe by regulatory authorities have been suggested to also increase the daily and even hourly risk of adverse health outcomes (Lin et al., 2018). Delfino, Zeiger, Seltzer, Street, and McLaren (2002) predicted that the next phase of epidemiological research would use better spatially and temporally resolved data that take into account personal time-placeactivity patterns and hourly exposures. For these reasons, the daily background pollution trend is briefly studied at selected monitoring sites in Madrid (Urban-traffic, urban-background and suburban) for all the considered pollutant. Fig. 7 illustrates the evolution of background and average pollution at these sites. Remarkably, the daily evolution Á. Gómez-Losada et al. Computers, Environment and Urban Systems 75 (2019) 1-11 pattern of average and background pollution is similar for the studied air pollutants, even the typology of monitoring sites is different as well as the genesis and dynamic of the studied pollutants. Fig. 8 shows the relation between the average and background pollution for 2017 considering all monitoring sites and air pollutants. These results show the similar dynamic experienced by the background levels with respect the ambient pollution, even the quantitative relation varies according to the hour of the day. The behaviour of this relation is less affected in the case of PM 10 , meanwhile in the case of O 3 the change is evidenced from midday onwards, and from 18 h onwards in the case of NO 2 . Background values for SO 2 remain practically constant through the day.

Limitations and strengths
It is important to consider that during the studied period (2001 to 2017), the AQMN of Madrid has experienced relocation of monitoring sites and change the focus of the monitored pollutants, according to the new requirements from European legislation regarding the number of stations required in urban environments (Directive 2008/50/EC). This circumstance might prevent to obtain a consistent view of air quality evolution in the city. However, the spatiotemporal approach presented in this work is useful to impute all missing values in all locations along the period of study. The proposed methodology contributes to spatiotemporal modelling of exposure levels with robustness to possible relocation of monitoring stations. The single impact of relocation is the uncertainty associated with the measurements, since the estimated one has higher variance, but allows integration of a whole geographical area in the analysis, regardless the continuity and length of the time series provided by every single monitoring station. It is important to clarify that, although the background pollution is influenced by local sources (Moreno et al., 2009), it can be less affected than ambient pollution when relocation schemes are performed. Secondly, the Madrid's relocation of sites was studied by Montero and Fernández-Avilés (2018) with regard PM 10 ambient pollution. These authors concluded that the new pollution maps of the city obtained after relocating sites show a similar pattern that would have been provided by the previous configuration of sites.
Urban concentration levels depend on atmospheric phenomena that occur at different spatial scales, from transboundary scales to street levels of a few meters (Monteiro, Miranda, Borrego, & Vauard, 2007). Additionally, these levels present complex interactions with a large variety of chemical in the atmosphere, to not cite few the meteorological conditions affecting their dynamics. Up to now, no single model can describe the process consistently so a combination of models is needed to address such description (Borge et al., 2014). The modelling results applied in this study could be integrated into other models in order to avoid failing to explain to what extent local and non-local sources contribute to the estimated background concentrations.
The background pollution and its spatial analysis can be helpful in environmental epidemiological studies concerning health effects detected in the studied area. Moreover, the estimation of the background pollution by this methodology could reduce the necessity of background monitoring sites. To confirm the levels obtained by this methodology only few of the existing sites would be necessary. This methodology would also provide important information to the population and can be applied to other forms of pollution as long as it is monitored at a convenient resolution. Air pollutions maps provide a complete air quality description, which can be helpful identifying new sources of emissions located inside of the monitored area.

Conclusions
In this study, the temporal and spatial scales of the background pollution were characterized during the period between 2001 and 2017, in Madrid (Spain). The difference between the ambient and background pollution between was practically constant for NO 2 and with few significant differences for PM 10 and SO 2 . Regarding O 3 , this difference makes clearer from 2009 onwards with a downward trend of the background pollution drawing a distinction with the ambient pollution. The ratio between the ambient and background concentration was constant for NO 2 , PM 10 and SO 2 . For NO 2 and PM 10 , the background pollution is on average six times lower and for SO 2 is around two times lower than the ambient pollution. Regarding O 3 , two epochs are distinguished (2001-2009 and 2010-2017), where the ratio increasing from 8.5 to 11.6, in each of them. The spatial analysis of background pollution is difficult to assess due to meteorological and physical factors and the regional contributions originated in adjacent municipals. Nevertheless, it can be distinguished two epochs regarding NO 2 background concentrations (2001-2008 and 2009-2017). The high levels observed in the first period are strongly dominated by the heavily trafficked M-30 road and by air traffic. The O 3 spatial gradients are consistent and higher levels of ambient O 3 in outskirts. With regard to PM 10 , higher concentrations were estimated at traffic hot-sites in 2007, 2011 and 2014. Moreover, these events can be affected by dust outbreaks from Sahara desert. The SO 2 background pollution has been decreasing during the study period, but higher concentrations were estimated in 2001, 2005, and 2009. The background pollution estimates from the four studied air pollutants were used to build a spatiotemporal dataset to perform a global multivariate analysis. The PCA showed a significant decrease of background pollutant concentrations after the activation of measures to comply with the Air Quality Directive in Spain. Besides, global behaviours of pollutants in the Eastern city outskirt related to industry and traffic were also identified, showing the usefulness of getting these estimates for further analysis. It has been seen that these models provide a comprehensive overview, and probably a robust approach, of the complex estimation of background air pollution, which represents a chronic level of exposure to which the population is permanently exposed in cities. Complementarily, it is recommended their combination with other modelling approaches, as new information inputs in epidemiological studies or to be extended to other forms of pollution. The performed modelling approaches are easy to implement and readily accessible at available R libraries or other commercial statistical software, making it possible to carry out all these analyses successfully without significant statistical expertise.

Disclaimer
The views expressed are purely those of the author and may not be regarded, in any circumstances, as stating an official position of the European Commission.