Evaluation of receptor and chemical transport models for PM10 source apportionment

In this study, the performance of two types of source apportionment models was evaluated by assessing the results provided by 40 different groups in the framework of an intercomparison organised by FAIRMODE WG3 (Forum for air quality modelling in Europe, Working Group 3). The evaluation was based on two performance indicators: z-scores and the root mean square error weighted by the reference uncertainty (RMSEu), with pre-established acceptability criteria. By involving models based on completely different and independent input data, such as receptor models (RMs) and chemical transport models (CTMs), the intercomparison provided a unique opportunity for their cross-validation. In addition, comparing the CTM chemical profiles with those measured directly at the source contributed to corroborate the consistency of the tested model results. The most commonly used RM was the US EPA- PMF version 5. RMs showed very good performance for the overall dataset (91% of z-scores accepted) while more difficulties were observed with the source contribution time series (72% of RMSEu accepted). Industrial activities proved to be the most difficult sources to be quantified by RMs, with high variability in the estimated contributions. In the CTMs, the sum of computed source contributions was lower than the measured gravimetric PM10 mass concentrations. The performance tests pointed out the differences between the two CTM approaches used for source apportionment in this study: brute force (or emission reduction impact) and tagged species methods. The sources meeting the z-score and RMSEu acceptability criteria tests were 50% and 86%, respectively. The CTM source contributions to PM10 were in the majority of cases lower than the RM averages for the corresponding source. The CTMs and RMs source contributions for the overall dataset were more comparable (83% of the z-scores accepted) than their time series (successful RMSEu in the range 25% - 34%). The comparability between CTMs and RMs varied depending on the source: traffic/exhaust and industry were the source categories with the best results in the RMSEu tests while the most critical ones were soil dust and road dust. The differences between RMs and CTMs source reconstructions confirmed the importance of cross validating the results of these two families of models. © 2019 The Authors


Introduction
Source Apportionment (SA) encompasses the modelling techniques used to relate the emissions from pollutions sources to the concentrations of such pollutants in ambient air. Applications of SA techniques have been reported to (a) determine the causes of pollution levels exceeding legislation thresholds, (b) support the design of air quality plans and action plans, (c) assess the effectiveness of remediation measures, (d) quantify the contribution from different areas within a country, (e) quantify transboundary pollution, (f) quantify natural sources and road salting, and (g) refine emission inventories .
The abovementioned definition of SA accommodates a wide range of techniques to obtain information about the actual influence that one or more sources have on a specific area over a specific time window. Such techniques may be based on the measured concentrations of pollutants and their components (receptor-oriented models or, more simply, receptor models, RMs) or on chemistry, transport and dispersion models (also known as source-oriented models, SMs). Among the RMs it is possible to distinguish between (a) explorative methods, that rely on empirical coefficients or ratios between species, and (b) receptor models based on multivariate analysis of all the chemical data at once (Watson et al., 2008;Hopke, 2016). SM techniques are based on air quality models, the most commonly used are Eulerian, Gaussian and Lagrangian ones (Fragkou et al., 2012). Gaussian models are used to describe only the dispersion of a pollutant near the source in a stationary way, while Lagrangian models describe it in a 3D dynamic way. With both types, the chemistry is assumed to be simple and linear or negligible, while Eulerian models (commonly referred to as Chemical Transport Models, CTMs) yield a description of pollutants in terms of motion, transport and dispersion, chemistry and other physical processes, accounting for all sources that are present in a given domain (emission inventories), as well as the influence from more distant sources (boundary conditions).
SA studies aim at quantifying the relationship between the emission from human activities or natural processes (referred to as activity sectors or macrosectors in the emission inventories) and the concentrations of a given pollutant (sectorial source apportionment). Specific SA techniques are designed to allocate the concentration of a pollutant to the geographic areas where it or its precursors were emitted (spatial source apportionment). The present study focuses on the first type of SA output resulting from either RMs or CTMs. These two families of models were selected because a) these are the most used ones in both research studies and air quality management applications in support of the EU Directive 2008/50/EC implementation (Fragkou et al., 2012;Belis et al., 2017), and b) these are techniques aimed at simultaneously describing the behaviour of all the knowable sources (that can be identified within the limitations of each technique).
RMs are used to perform SA by analysing the chemical and physical parameters measured at one or more specific sites (receptors). They are based on the mass conservation principle and identify sources/factors by solving the mass balance equation X ¼ G F þ E, where X is a matrix containing ambient measurements of pollutant properties, typically chemical concentrations of gases and/or particles that include markers for different sources (or source categories), F is a matrix whose vectors rows represent the profiles of p sources, G is a matrix whose columns represent the contributions of the p sources and E is the residual matrix. Those techniques applying weighted least-squares minimisation (or other type of minimisation techniques) fit to the ambient measurements using measured source profiles to solve the equation are referred to as chemical mass balance methods (e.g., CMB), while models which solve the equation without using 'a priori' information on source composition are factor analytical models, the most common of which is positive matrix factorisation (PMF). A detailed description of the RMs, is provided e.g. in Watson et al. (2008), Hopke (2016), Viana et al. (2008), and Belis et al. (2013).
In Table 1 are briefly presented the approaches used to apportion a pollutant to its sources with CTMs: a) brute force (BF) (a terminology widely used in the literature e.g. Pillon et al., 2011;Burr and Zhang, 2011;Baker et al., 2013;Lin and Wen, 2015;Wang et al., 2015) also referred to by Thunis et al. (2019) as emission reduction impact (ERI), and b) tagged species (TS).
In the TS approach, a specific module, executed during the model run (online), earmarks the mass of chemical species to track the atmospheric fate of every source category or region (or a combination of both). Since the tagged species (also known as "reactive tracers") can undergo chemical transformations, the module is also able to track the sources of secondary pollutants. The TS approach fulfils a mass balance equation ensuring the sum of all the source contributions equals the total concentration of the pollutant. TS algorithms are implemented in several CTM systems, such as, CAMx-PSAT (Yarwood et al., 2004;Wagstrom et al., 2008;ENVIRON, 2014), CMAQ-TSSA (Wang et al., 2009), CMAQ-ISAM (Byun and Schere, 2006;Foley et al., 2010;Napelenok et al., 2014), DEHM (Brandt et al., 2013) and LOTOS-EUROS Kranenburg et al., 2013).
The BF/ERI approach estimates the concentration changes (delta) attributable to a source by subtracting from the run where all the source emissions are present (the base case) a run where the emission source to be analysed has been modified (normally reduced) by a given percentage (e.g. Burr and Zhang, 2011;Wang et al., 2015;Uranishi et al., 2017). In this approach, an independent run is necessary for every source. Unlike TS, there are no specific model modules to execute the BF/ERI approach. It can be applied with any CTM in a post-processing step. In nonlinear situations, the sum of the concentrations allocated to the single sources can differ from the concentration of the pollutant in the base run. This problem may be dealt with in different ways. The more straightforward one is to re-normalise the mass of the sources to the total mass of the pollutant in the base case applying, for instance, multilinear regression or a similar method. A more appropriate technique from the mathematical point of view is to quantify such non-linearities by attributing a mass to all the possible interactions between sources. This is accomplished by computing additional simulations where two or more sources (all the possible combinations) are reduced at once and subsequently applying a factor decomposition (Stein and Alpert, 1993;Clappier et al., 2017). However, such elaboration requires a considerable number (N) of simulations; where N ¼ 2 s -1 and s denotes the number of sources. Another characteristic of the BF/ERI approach is that the mass attributed to the sources depends on the applied emission reduction factor .
In summary, RMs and CTM-TS approach quantify the mass that is transferred from the source to the receptor. Therefore, they can be grouped under the category mass-transfer (MT) source apportionment. On the contrary, the CTM-BF/ERI approach is actually a sensitivity analysis method that estimates the changes in concentration that would result from a change in emissions and, hence, it should be placed under a different category. Since the results obtained with MT and BF/ERI may differ, we propose to call the output of the former "contribution" and the one of the latter "impact".
SA techniques have been used for many air pollutants such as NO 2 , O 3 , VOC (e.g. Baudic et al., 2016;Dunker et al., 2015). However, most of the studies focus on particulate matter (PM). Considering the higher experience of SA on this pollutant and the high number of exceedances in Europe (EEA, 2018) and in many areas of the world , this study is focused on SA of PM 10 .
Assessing the performance of SA model applications is essential to guarantee reliable and harmonised information on the influence of sources on pollution levels. To that end, a series of intercomparisons were performed in the framework of FAIRMODE (Forum on Air Quality Modelling) focused on RMs (Belis et al., 2015a and2015b). Moreover, as highlighted in a recent study combining PMF and CAMX-PSAT, a reference methodology to comparing the results of RMs and CTMs is missing (Bove et al., 2014). In this paper, we introduce a comprehensive approach for the assessment of different source apportionment techniques at once as a step towards an integrated assessment and validation methodology for SA models. The objectives of this study are: What is the actual mass transferred from a pollutant source to its concentration in a given location and period?
What would be the reduction in the pollutant concentrations corresponding to a given reduction in the emissions of its precursors? Runs Apportion all the sources in one single run.
Requires a number of runs equal to the number of sources to apportion plus the run with all sources (base case).

Mass conservation
The sum of the contributions of the sources equals the total mass (concentration) of the apportioned pollutant (by definition).
The mass allocated to the sources is obtained from independent runs. Therefore, the sum of source contributions may not match the total pollutant mass obtained in the base case (nonlinear behaviour).

Advantages
Provides a picture of the contributions referred to the specific emission inventory and meteorological fields used as input.
Useful to attribute the actual impacts of sources on health and vegetation.
Useful to evaluate the impact of abatement measures. Can be used with any model.

Disadvantages
Requires additional coding effort. For non-linear pollutants, the source contributions cannot be extrapolated to situations different than the modelled case.
Requires many runs. The mass is not always conserved (see above).
The mass allocated to a source depends on the emission reduction level.
� To assess the differences in the SA results between RMs, between the TS and BF/ERI CTM approaches, and between RMs and CTMs; � To characterise the performance of SA model applications using preestablished criteria for deciding whether results are comparable or not; � To assess whether the CTM SA performance is influenced by differences between sites; � To test the impact of different spatial resolutions and vertical mixing coefficients on the CTM SA performance.
To achieve the abovementioned objectives, a consolidated assessment methodology already applied to RMs was adopted.

Methodology
The assessment of the SA models followed multiple steps to evaluate the different aspects of a SA result (see Appendix sections A1 and A2). The focus of this article is on the performance tests used to evaluate every source category separately. Considering that the actual contribution/impact of sources in a given area is unknown, the reference for the performance test is derived from the participants' average and standard deviation. The advantage of this methodology is that every test is associated with acceptability thresholds based on widely recognised indicators that have been tested and used in previous FAIRMODE SA intercomparisons. For a detailed description of the methodology refer to Belis et al. (2015aBelis et al. ( , 2015b. The same approach was used to evaluate the performances of RMs and CTMs separately. The RMs were assessed, only at the reference site of Lens (see next section for details), while a set of 10 different sites ( Fig. 1) was used in the comparison between CTMs. In addition, a cross comparison between RMs and SMs was accomplished for the Lens site, setting the RMs as the reference for methodological reasons without any "a priori" implication about the reliability of the different methods.
This study involved 40 groups, 33 of which delivered 38 RM results and the remaining seven reported 11 CTM results. The participants delivered results consisting of a set of candidate sources (hereon, candidates) with the estimation of their contributions or impacts (hereon, source contribution or impact estimates, SCIEs) expressed as μg/m 3 . In section A3 of the Appendix are listed the most commonly used abbreviations.

Set up of the RMs intercomparison
The reference site of Lens (France), located in a densely populated area (>500 inhabitants/km 2 ), is part of the official monitoring network and is classified as an urban background monitoring station (Fig. 1). According to the EU legislation (2008/50/EC), such locations are places in urban areas where levels are representative of the exposure of the general urban population.
The dataset of measurements used for the RM intercomparison was produced in the framework of the CARA project directed by the French reference laboratory for air-quality monitoring (LCSQA). This dataset contained 116 PM 10 daily concentrations collected every third day between March 2011 and March 2012. The concentration of 98 chemical species and their uncertainties were provided for every sample including: ions, organic and elemental carbon (OC/EC), trace elements, polycyclic aromatic hydrocarbons (PAHs), anhydrosugars, hopanes, nalkanes, primary organic aerosol (POA) markers, and the total PM 10 gravimetric mass concentrations (Table A2 of the Appendix). Details on the analytical techniques can be found in Waked et al. (2014) and Golly et al. (2015). The average PM 10 concentration and composition in the dataset is shown in Table S1 of the supplementary material. In order to harmonise the nomenclature, the candidates reported by participants were encoded in conformity with the SPECIEUROPE database source categories (Pernigotti et al., 2016). The source chemical profiles from this database are also referred to as 'reference' profiles (Table 2).
In this study the fuel oil source is the combustion of residual (heavy) oil that normally takes place in power plants. Since this fuel is also used in maritime transport, ship emissions in seaports may contribute up to 10% of the primary PM 10 (Amato et al., 2016;Bove et al., 2018). However, due to the considerable distance of the study site from the closest harbour (75-80 km), the influence of this source on the primary PM 10 is expected to be small. In this exercise, very few results (11%) univocally identified shipping as a source. This is in line with a study on PM 10 sources using receptor models by Waked et al. (2014) in the same location which allocated a 4% of primary PM 10 to oil combustion associated with land activities (power generation and industrial emissions) excluding the influence of ship emissions (section 3.1.1).

Domains and time window
CTMs were run over two nested computational domains with different spatial grid resolutions: one covering the whole of Europe and a smaller one around the reference site area (Lens). The FAIRMODE-EU domain lat/long grid (Table S2, supplementary material) was set to be compatible with the spatial resolution of the emission inventory and to avoid any interpolation of the emission data (section 2.2.2). The domain extension was defined to provide suitable regional boundary conditions for the reference area simulations. It includes a portion of North Africa to account for dust emissions, while the northern boundary was set around latitude 65.0 to minimise the spatial distortions at high latitudes when using a lat/long grid. The grid step corresponds to roughly 18-20 km.
The FAIRMODE-LENS domain ( Fig. 1) was defined as a subset of the emission inventory grid as well as of EU grid, once again to avoid any interpolation of emission data. The domain is centred over Lens, but it is large enough to allow a reasonable description of the PM fate in the atmosphere, limiting the influence of boundary conditions. The grid step corresponds to roughly 6-7 km.
Two three-month periods for CTM modelling were defined: 1/6 to 31/8/2011 and 15/11/2011 to 15/2/2012. Such time windows were selected to be representative of both the warm and cold seasons and long enough to include both high pollution episodes and low concentration situations. Moreover, because PM chemical composition data were available at Lens only every third day, three-months simulations were needed to pair at least 30 daily SA results of RMs and CTMs.

Input datasets
Emissions. The anthropogenic emissions used for this intercomparison were derived from the TNO_MACC-III emission data, which is an update of the TNO_MACC-II emission inventory (Kuenen et al., 2014). The inventory is structured according to the Selected Nomenclature for sources of Air Pollution (SNAP, EEA, 1996;EEA, 2000) and is combined with fuel use information by country and by sector derived from the literature and country data. This emission inventory included an enhanced source classification according to fuels for SNAP macrosectors 2 and 7 (Tables 3  and Table A3). Particularly, emissions due to combustion from the residential and small commercial combustion sectors were split between fossil fuels (coal, light liquid, medium liquid, heavy liquid and gas) and solid biomass. Emissions from the road transport sector were split according to three main fuels (gasoline, diesel and LPG/natural gas), while non-exhaust sources included evaporation and tyre and road wear. Among non-road transport emissions, international shipping was specifically accounted for. The reconstruction of the natural emissions (dust resuspension, sea salt and biogenic VOCs) was left to the modelling groups, because in most cases such emission modules were embedded in the modelling chain.
Meteorological fields. Meteorological fields were obtained by applying the WRF model in a nested (one way) configuration in order to provide fields for both the EU and LENS domains. The WRF simulations were performed over a Lambert conformal domain. This choice implied a preprocessing phase of the meteorological fields in order to feed CTMs, however, it should be considered that: (a) most of the CTM model chains needed meteorological fields pre-processing anyhow (e.g. to compute turbulence and other additional parameters), and (b) the use of a lat/ long grid over northern Europe may lead to considerable distortion effects.
Both WRF domains covered the corresponding CTM/output domains leaving also a border area. To limit the degradation of the meteorological information during the interpolation phase the WRF domains adopted the same grid step as the CTM domain (18 km and 6 km, respectively). The WRF physical configuration is provided in Table S3 of the supplementary material. The LOTOS-EUROS was the only model run with its own ECMWF meteorology.
Boundary conditions. Boundary conditions were derived from the MACC global model using the same approach that has been adopted in other European initiatives like EURODELTA-III and AQMEII-3 (http://aqmeii. jrc.ec.europa.eu/).

Definition of common sources for RMs and CTMs and CTM receptor sites
RM sources are determined by their chemical fingerprints and time trends, while CTM sources reflect the nomenclature of the emission inventory used as input. In addition, CTMs allocate both primary and secondary PM to its sources, the latter being attributed to the sources of its gaseous precursors. Due to the different definition of their sources, comparing the SA output of RMs and CTMs is not straightforward. Table 3 shows the correspondence between RMs and CTMs sources established in this study to compare their outputs. To that end, the SNAP source classification commonly used in the emission inventories (Table A3) is compared at the macrosector level with the SPECIEUROPE source categories. Since RMs did not provide SCIEs comparable with SNAP macrosectors 5 (extraction and distribution of fossil fuels), 6 (product use), and 9 (waste treatment), the mass apportioned by CTM to these macrosectors was pooled under a generic non-apportioned mass category (99 OTH). For the remaining source categories, two sets with different degree of detail for macrosectors 2 (residential and commercial combustion), 7 (road transport) and 11 (natural sources) were defined ( Table 3). The mandatory (MDT) set consisted of 7 sources plus one corresponding to the non-apportioned mass (99 OTH). The more comprehensive optional set (OPT) encompassed 13 sources plus the nonapportioned category.
To assess the spatial variability of the CTM SA performances, a set of receptor sites representing different types of locations (urban, suburban and rural) among the AIRBASE European monitoring network (EEA, 2015) were selected (Table 4). In addition, the site of Ghent was included because of the availability of detailed PM 10 chemical composition from the CHEMKAR PM 10 study (VMM, 2013).

CTM model performance evaluation (MPE)
Each modelling group performed its own model performance evaluation (MPE) by comparing CTM results against observed data for the main chemical species (i.e. sulphate, nitrate, ammonium, elemental carbon and organic aerosol). In order to support the interpretation of the SA performance, a centralised MPE was also accomplished for PM 10 and its main chemical components on all the model base case results (Annex 1, supplementary material). The objective was to assess whether the ability of models to reproduce the PM mass and main chemical components influences the model SA output. To that end the modelled mass of PM and main chemical components were compared with measurements. The centralised MPE was performed using a subset of sites selected to cover the different meteorological and emissive features of the Lens domain: Lens (reference site), London and Paris (megacities), Ghent (middle-sized city), Le Havre and Calais (coastal areas), and the rural background station Vredepeel-Vredeweg. Model results were evaluated by means of a few statistical indicators: the mean bias (MB), the normalised mean bias (NMB), the root mean square error (RMSE) and the Pearson correlation.

CTM source chemical profiles
The differences between CTM SA results were assessed using a similarity test discussed in section 3.2.2. It was accomplished by comparing the chemical profiles of the sources reported by participants (CTM chemical profiles) among each other (ff tests) and with a set of 1160 external (measured and derived) source profiles (fr tests) deriving from the SPECIEUROPE database developed at JRC (Pernigotti et al., 2016) and the SPECIATE database of the US EPA (Hsu et al., 2014).

Table 3
Correspondence of source nomenclature between CTMs and RMs for both the mandatory and optional sets. A more detailed description of the SNAP macrosectors is given in Table A3.

Receptor models
A total of 38 different RM results were delivered by 33 participants. The majority of the RM results (31) were obtained using the US-EPA version 5 of the positive matrix factorisation tool (EPA-PMF5; Norris et al., 2014). ME-2 (multilinear engine 2; Paatero, 1999) scripts were used in two cases. Only one result was reported with the following tools: RCMB (robotic chemical mass balance; Argyropoulos and Samara, 2011), MLPCA-MCR-ALS (MLPCA, maximum likelihood principal components analysis-multivariate curve resolution-alternating least squares, Izadmanesh et al., 2017), PMF version 2 (PMF2; Paatero and Tapper, 1993;Paatero, 1997), EPA-PMF version 3 (PMF3; Norris et al., 2008) and EPA-PMF version 4 (PMF4). The RM results were labelled with letters from A to Z and then from *A to *L. In Figs. 2 and 3, the sources are labelled in conformity with the corresponding SPECIEUROPE database source categories (Pernigotti et al., 2016).

Table 4
Receptor sites used to assess the spatial variability of the CTM SA performances.  . The areas of acceptance and warning are indicated with green and orange background, respectively. Only candidates with warning or bad scores are annotated in the plots (letter: result; sequential number: candidate). In (a) the used RM is indicated under each participant (red for models different than PMF5). The source category codes are given in Table 2. The scores of result Q are not shown because out-of-scale. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Performance tests
The z-scores evaluate the performance of the SCIE for the overall studied time window (one year in this study). In Table 5 are shown the reference SCIE for the sources reported in more than three SA results (diesel, gasoline, brake, metallurgy and agriculture are excluded) with their uncertainty. The time series of the reference and the scatter plots between candidates and references are shown in Figs. S1 and S2 of the supplementary material. If we consider the consensus between participants as an indicator of the relevance of the sources those reported in less than 25% of the results (coal, and ship) could be regarded as nonrobust ones (Pernigotti and Belis, 2018). Although biomass burning and wood burning have a slightly different connotation (the former is more inclusive than the latter), in this study the two expressions have apparently been used by participants to represent the same source. The SCIE of a nine source solution reported by Waked et al. (2014) in the same site (Table 5, last two columns) fall within the uncertainty range of the reference (except one). The main differences of the French study with the references of the present study are the absence of industry, the lack of detail within the traffic source (exhaust, road dust) and the lower contribution of the fuel oil source.
In the performance tests, 91% of the z-scores of RM candidates ( Fig. 2) fall within the area of acceptability (green) indicating a general good agreement between the reported results and the reference values with a tolerance of 50% (Fig. 2a). The results Q, B, H, *I and *L are those with the higher number of candidates out of the acceptability zone. The result Q was obtained with the recently developed MLPCA-MRC-ALS technique. It is similar to PMF with the difference that it solves the mass balance equation using the alternating least squares algorithm (Tauler et al., 2009). The out-of-scale z-score of this result have been attributed to a misinterpretation on how to report the average SCIE. This interpretation was confirmed by the analysis of the time series where the RMSE u are of the same order of magnitude of as the other participants and in some cases passed the test (see below in this section).
Most of the overestimated SCIEs are in the industry source category (20 ind) while some others are in traffic, soil, biomass burning and coal combustion source categories (1 tra, 10 soi, 30 fuel and 31 coa; Fig. 2b). Conversely, underestimated SCIEs are observed mainly in the exhaust, biomass burning, and primary organic aerosol, source categories (2 exh, 40 bib and 70 poa).
The RMSE u scores of the candidates are displayed in target plots ( Fig. 3) where the abscissa represents the uncertainty-weighted centred root mean square error (CRMSE/u) and the ordinate the uncertaintyweighted bias (BIAS/u). The green circle delimits the acceptability area while the dotted and solid lines correspond to the 0.5 and 1.0 levels, respectively. The candidates are indicated with the same codes as used in Fig. 2.
In the RMSE u test, 72% of the candidates fall in the acceptability area (Fig. 3a). The areas with scores of sources showing a high share of rejected candidates are schematically represented in Fig. 3 bi. Those are: industry (3b), fuel oil (3c), soil (3d), road (3e), biomass burning (3f), primary organic aerosol (3g), ship (3h) and coal (3i). The source categories showing the highest percentage of candidates with poor scores in this test are ship (75%), coal burning (71%), and fuel oil (60%). The first two are likely not well identified sources, considering that they were reported in less than 25% of the results. The low or non-detectable contribution of coal burning is explained by the low share (<5%) in 2011-2012 of this fuel in the French energy mix (https://www.iea.org) while contributions from primary ship emissions are unlikely, considering the distance of the site to harbours. Although fuel oil was reported in the majority of the results and the average SCIEs are coherent with the reference (as indicated by the z-score test), the time series of this source in the different results are divergent. In this source, the rejected RMSE u scores are distributed in different quadrants of the target plot. However, those with higher deviations are located in the top right quadrant suggesting overestimation and higher amplitude than the reference. A similar situation is observed in the time series of the industry source category where a 30% of the candidates presented both positive bias and amplitude problems with respect to the reference (i.e. fall in the top right quadrant of the target plot, Fig. 3b). The RMSE u confirms the indications of the z-score test for the industry and coal combustion.
Positive bias is also observed in some candidates of soil and traffic while biomass burning and primary organic aerosols (70 poa) present either positive or negative biases. Extreme cases of variance higher than the reference are candidates H5 (industry, fuel oil), *E9 (fuel oil) and B4 (soil). On the contrary, the rejected candidates of the ship source category (37 shi) are mainly located in the lower quadrants suggesting a tendency to underestimate the reference SCIE time series. In a number of sources, the bad scores are distributed in all the quadrants of the target plot (Fig. 3), therefore, the lack of coherence between them and the reference is probably depending on random variability determined by a combination of factors.
Additionally, the frequency of the candidates in every source category (Table 2) is an indicator of robustness of the source identification (Pernigotti and Belis, 2018). The better identified source categories from this point of view are: biomass burning, marine, traffic, soil, primary organic aerosol, fuel oil, industry, aged sea salt, and the secondary inorganic aerosol (SIA, ammonium nitrate and ammonium sulphate).

Conditions influencing the performance of RM
Although RMSE u and z-score (without sign) are correlated to a certain extent (R 2 ¼ 0.43; Fig. S3), they evaluate different aspects of the SA results. As observed in previous intercomparisons (Belis et al., 2015b), the RMSE u test is stricter because, in addition to the bias, it assesses the phase and amplitude of the time series curves (Jolliff et al., 2009). Moreover, this finding is coherent with the concept that the dispersion of the result's average SCIEs around the overall mean (z score) is lower than the one of the single samples' SCIEs in all the results (RMSE u ).
The high number of RM results evaluated in this study (38) gives a unique opportunity to analyse the conditions that influence the performance of the results. To that aim, their RMSE u and z-scores were compared with the self-declared experience of the RM practitioners and the number of candidates in the reported results as shown in Fig. S4 of the supplementary material.
In Fig. S4a, the performance of results delivered by practitioners that have conducted 10 or less studies (intermediates) is quite variable while practitioners declaring to have conducted more than 10 studies (experienced; 24% of the total number of participants) always show good performances (low scores). The Mann-Whitney non-parametric test (Palisade StatTools 7.6) confirms that experienced practitioners have significantly better RMSE u performance than intermediate ones (p¼0.0267) and the Wilcoxon Signed-Rank non-parametric test (Palisade StatTools 7.6) proves that the experienced practitioners have RMSE u lower than 1 (p¼0.001) while this hypothesis is rejected for intermediate ones (p¼0.7378). The difference between experienced and intermediates is less evident for the z-scores. Results (solutions) with a number of candidate sources (factors) close to the average (9) present better z-scores than those with a difference of 3 or more sources (Fig. S4b). This relationship is less evident for RMSE u suggesting that failing to retrieve the optimal number of sources impacts mainly on the bias of the result. In multivariate analysis including redundant variables, chemical species with high covariance may degrade the quality of the results and lead to poor model output diagnostics. For that reason, variables that are highly correlated with others may be removed because they do not add meaningful information to the analysis. This is very often the case for families of organic species (PAHs, alkanes, hopanes) in which many or all of the members show very similar time trends. In such cases, replacing them either by one representative of the family or by the sum of all the members of that family leads to better model diagnostics and more physically meaningful results. In this study, it was up to the practitioners to establish the optimal set up of chemical species for their runs. A variable number of the 98 species present in the input dataset were used by practitioners to obtain the reported results. Only seven of them were obtained with more than 80 species while the majority used less than 40 species. Such a low number of species is due to the exclusion of some of them (after selecting the most robust ones) or to the practice of pooling the members of some organic species families (PAHs, hopanes, n-alkanes) and treating them as one single species. The comparison between the number of species and the performance indicators (not shown) did not present any clear relationship between these two parameters.
Constraints were applied in only eight of the RM results (21%): five of those were obtained with EPA-PMF5, two with ME-2 and one with RCMB. No significant influence of using constraints on the SA performance was observed.

Chemical transport models
A total of eleven CTM results were reported by seven groups using five different models, as described in section 2.2. These results were encoded with the prefix "c" (to distinguish them from the RM results) and a capital letter corresponding to each model: CAMX-PSAT (Yarwood et al., 2004;A), FARM (Gariazzo et al., 2007;, LOTOS-EUROS , EURAD (Strunk et al., 2010; and CHIMERE (Menut et al., 2013;. All the groups reported results apportioning the mandatory set of sources, while the optional set of sources (denoted with the suffix "o") was reported for only three models (A, B, and D). All the models applied specific parameterisation to quantify the PM attributable to dust resuspension while EURAD and LOTOS-EUROS estimated also the road dust. The latter was the only model which quantified the dust resuspension attributable to agriculture. Parameterisation was also used by CAMX-PSAT, FARM and LOTOS-EUROS to compute the PM deriving from sea salt while the former two and EURAD modelled the fraction deriving from biogenic VOCs. Different versions of the BF/ERI approach were used in this study. FARM adopted a 20% emission reduction while EURAD implemented a 100% emission variation. In both cases, the impacts were normalised to correct the effects of non-linearity. The base case total PM mass was used for normalisation in EURAD while FARM adopted the sum of all source deltas for the same purpose. Since nitrate and ammonium were not apportioned in the CHIMERE result, this model result was used to assess whether the testing methodology was able to detect significant differences between the sources where these species are dominant (e.g. road transport and agriculture). In addition to the performance assessment, three sensitivity runs were executed with CAMX-PSAT to analyse the influence of the spatial resolution and the vertical diffusion coefficient on the model SA performance (section 3.2.6).

CTM model performance evaluation
The temporal evolution of the PM 10 concentration during the winter period was mainly driven by regional scale processes. Therefore, the selected modelling approach adopting a 7 km horizontal resolution was adequate to reproduce it, as well as to perform the SA analysis. According to the MPE, CTM underestimations were more frequent at high concentrations, suggesting that the overall source contributions deriving from this SA analysis are likely more robust than those of the exceedance days that were poorly reproduced by models. The observed strong underestimation of the organic aerosol may have influenced the reliability of the SCIEs, particularly concerning domestic heating during winter, biogenic sources during summer, and road transport for both seasons. The only exception is EURAD that presented a moderate overestimation of OA in summer. This model also showed a marked overestimation of EC. In this MPE, it was observed that FARM, LOTOS-EUROS and EURAD (only summer) models underestimated sulphate and this could have had an influence on the reconstruction of the source contribution from sources where sulphur is an important component (e.g. energy production and shipping). In summer, the overestimation of nitrate observed in CAMX-PSAT, EURAD and partially CHIMERE models gave rise to a corresponding overestimation of the ammonium concentration that may have influenced the estimation of contributions from agriculture.

Chemical profiles of the CTM sources
In CTMs the PM total mass is reconstructed by processing a set of simplified chemical components some of which are pure substances (nitrate, sulphate, ammonium) while others encompass a relatively wide range of species (organic carbon, elemental carbon and other primary aerosol). The concentration of these simplified chemical components in the different sources were reported in the results. Such chemical profiles of the CTM sources (CTM chemical profiles) were assessed by comparing them among each other (ff tests; in Fig. S5 of the supplementary material) and with the reference profiles of SPECIATE-SPECIEUROPE (fr tests; in Fig. S5). Fig. S5 shows that the CTM chemical profiles of the sources in the different results were in general quite comparable among each other (ff tests). This holds for most of the anthropogenic sources (such as road transport, industry and energy production), because they were reconstructed by all models on the basis of the same emission inventory and speciation profiles. Lower comparability among results was observed for dust, sea salt (not shown in Fig. S5 because in the optional set only) and agriculture. Dust and sea salt are among the sources not included in the emission inventory that were estimated with a different approach in every model.
The similarity with the SPECIATE/SPECIEUROPE reference profiles (fr tests) was relatively good for biomass burning while it was limited for agriculture, dust, industry and energy production. For the latter three sources, this is probably due to the limited number of species in the CTM chemical profiles and the consequent lack of specific markers. Moreover, for point sources such as power plants or factories, it is unlikely that all the reference profiles are coherent with the specific sources affecting a given receptor site. In Fig. S5, a high variability in agriculture with a considerable share of scores in the rejection area is observed in both fr and ff tests. To explain this behaviour, a more detailed investigation of its source profiles including the six considered chemical species was performed (Fig. 4). In this plot, the differences between the CTM approaches become evident. In the TS approach (CAMX-PSAT, LOTOS-EUROS), the profile is dominated by ammonium with a variable contribution of other organic primary anthropogenic aerosol (OPA). This species consists of trace elements and debris resuspension, the latter being an additional emission term present only in the LOTOS-EUROS model. In the BF/ERI approach (FARM, EURAD), the dominant components are nitrate and ammonium. The differences in the CTM chemical profiles depend on the way the two approaches attribute mass to the sources. The TS approach keeps track of the source from which every chemical component derives. When ammonia from agriculture reacts with nitric acid deriving from NOx emitted by combustion processes (e. g. traffic), the TS approach attributes the mass of ammonium to agriculture and the one of nitrate to the respective combustion source. In contrast, the BF/ERI approach attributes the mass by computing the difference between the base case and a simulation where the emissions of the relevant source are modified. Since changing agricultural emissions of ammonia leads to a variation in the concentration of ammonium nitrate with respect to the base case, both ammonium (NH4 þ ) and nitrate (NO3 -) are attributed to agriculture with this method.
This analysis of the agriculture CTM chemical profiles evidenced the potential differences in the mass attributed to the sources that may derive from the application of different CTM SA approaches. To evaluate the relevance of those differences, the CTM performance tests were performed distinguishing the TS and BF/ERI results.

Performance tests
In order to test the differences between BF/ERI and TS approaches in the apportionment of PM 10 , the performance tests were calculated using only the results obtained with CTM TS models (i.e. CAMX-PSAT and LOTOS-EUROS) for the construction of the ensemble reference. This approach was chosen in this analysis because its output is comparable with the one of RM (both are classified as MT source apportionment and their outputs are contributions, see section 1). Another advantage of using CTM TS for the reference is that they provide the same values for comparable sources in the optional and mandatory sets. However, the methodological choice of setting one of the CTM approaches as reference is not to be intended as recognition of a higher quality or hierarchy. Therefore, the performance of each model in the different tests has to be intended as a quantitative measure of the distance among the different results. The reference SCIEs for the sources reported in the CTM results for the MDT and OPT set of sources are shown in Table 6. The time series of the references and the comparison between candidates and references (scatter plots) are shown in Figs. S6 and S7 of the Supplementary Material, respectively.
As shown in Fig. 5a, b for the reference site of Lens, the majority of the candidates (80% and 87%, respectively) passed the z-score test in the mandatory and optional sets, indicating that the average mass attributed to sources by models was in most cases comparable to the TS reference with a tolerance of 50%. A more detailed evaluation highlighted that candidates of the CHIMERE (cF) result in the power plants, agriculture and ship categories were underestimated. This is explained by the nonapportioned nitrate and ammonium in this result, as discussed above. The biomass burning, dust, and biogenic SOA masses obtained with FARM (cB) were in the area of overestimation, while the dust source contribution reported by CAMX-PSAT (cA) and the biogenic SOA produced with LOTOS-EUROS (cD) were underestimated with respect to the reference. The explanation is that dust and biogenic aerosol were not present in the common emission inventory and were, therefore, treated differently by every model. In addition, biogenic SOA was not computed by LOTOS-EUROS.
The RMSE u test criteria at the site of Lens were fulfilled by 51% of the candidates of the MDT set and 77% of the OPT set (Fig. 5 c, d). In this test, the distinction between the CTM approaches was very clear. All the candidates obtained with the TS approach were in the area of acceptability and all the candidates outside the target area were produced with the BF/ERI approach. In the MDT set only the candidates for industry and ship of the cB result (FARM) passed the RMSE u test, while the cE (EURAD) and cF (CHIMERE) results presented only one candidate in the target each: industry and biomass burning, respectively (Fig. 5c). A similar situation was observed in the OPT set of sources (Fig. 5d). In this case, the cBo (FARM) were in the target for the ship, industry, diesel and road dust sources.
The results of the performance tests indicate that the differences between the CTM approaches were more evident for the daily SCIE time series (target plots) than for the SCIE averages (z-scores). In order to explore the results more in detail, in Fig. 6 are only shown the scores with the highest differences between TS and BF/ERI (scores outside the target) and the possible causes of the discrepancy are indicated.
In Fig. 6a and b the sources with the highest share of z-scores outside the acceptability area are flagged: dust, biogenic aerosol and biomass. The explanation for the first two sources is that they were treated differently by every model because not present in the common emission inventory (section 2.2.2). In the case of biomass burning, the overestimation of its contribution could be related to the corresponding EC and OC overestimation showed by EURAD when compared against observations and the other models. The analysis of Fig. 6a and b suggests that the differences between the CTM approaches in the average SCIEs (z-scores) are mainly attributable to the different model parameterisation. In the target plots of Fig. 6c and d are shown only the candidates with scores outside the target. Such plots confirm the bias for dust, biomass burning and biogenic aerosol highlighted by the z-score test. Moreover, in these target plots other (anthropogenic) sources obtained with BF/ERI methods appear outside the target: traffic, energy production and agriculture. These three sources are well-known emitters of gaseous precursors (nitrogen oxides, sulphur dioxide and ammonia, respectively), that react in the atmosphere to produce secondary inorganic aerosol (SIA). As already explained in sections 1 and 3.2.2, the two CTM approaches (TS and BF/ERI) allocate SIA in a different way. The

Table 6
Overall SCIEs and uncertainties of the reference sources (μg/m 3 ) in the site of Lens used for the z-score tests of the mandatory (MDT) and optional (OPT) set of sources. discrepancy between the mass apportioned by TS and BF/ERI to these sources of secondary aerosol suggest that the nonlinearities are not negligible . Besides, biomass burning emits, as any combustion process, nitrogen oxides. For that reason, it cannot be excluded that the differences pointed out by the test for this source could be in part attributable to the emission of precursors that are involved in secondary processes. The results of the performance test highlighted in Fig. 6 suggest that the non-linearities causing the differences between TS and BF/ERI are stronger when dealing with SCIEs for short time intervals (daily averages, RMSE u ) than with long-term SCIE averages (six months representing the warm and the cold seasons, z-scores). Another possible explanation for the effect of time resolution on the model SA performance could be the underestimation of CTMs observed during the exceedance days (section 3.2.1).

The case of agriculture
The differences in the CTM chemical profiles of the agriculture PM 10 sources detected in the preliminary analysis of the data pointed out a clear influence of the adopted CTM SA approach on the mass attributed to this source (section 3.2.2), highlighting the need for a more thorough analysis. The mass attributed to this source and its time trend varied considerably among the different models. The highest estimation (obtained with CAMX-PSAT; 1.8 μg/m 3 ) was three times higher than the lowest one (EURAD; 0.6 μg/m 3 ) and was associated with the overestimation of NH 4 by CAMx and the effect of normalisation in BF/ERI models. Furthermore, FARM presented the highest levels in the warm period while LOTOS-EUROS and CAMX-PSAT, that presented the most correlated time trends (R¼0.6), showed the highest contributions in the cold period (Fig. 7 left).
In the target plot (Fig. 7 right), the TS models (CAMX-PSAT and LOTOS-EUROS) scored in the acceptability area, while all the BF/ERI models did not. Hence, the RMSE u test confirms that the results obtained with the two CTM approaches for the agriculture source are not comparable as hypothesised in section 3.2.2.

Inter site variability in the CTM SA
In Fig. 8, the relative average SCIEs for PM 10 at every site are displayed for the eight CTM results evaluated in the performance tests. The results of three models (CAMX-PSAT, FARM and LOTOS-EUROS) were available for both the MDT and the OPT sets of sources while for EURAD and CHIMERE results were available only for MDT.
According to the share of sources four groups of sites were identified: the megacities (London and Paris), the medium-sized cities (Brussels, Ghent, Lens), the coastal sites (Calais and Le Havre), and the background sites (Rural backgd 1 and 2, and suburban bkgd). The only exception is EURAD where all the sites presented exactly the same relative split of sources. Therefore, in this result the difference between sites was only modulated by the different PM concentrations. In the megacities, the predominant source was traffic. At the coastal sites, the dominant source was sea-salt (present only in OPT), which was also significant at the other types of sites. In the medium-sized cities, traffic was sometimes dominant (FARM, LOTOS-EUROS) and sometimes was comparable to the other sources. On the contrary, at the background sites there was no clearly dominant source. Traffic, domestic combustion, biomass burning, shipping and industry all contributed in variable but still relatively comparable proportions. The resemblance in the share of sources between medium-sized cities and background sites suggests that due to the high density of urban areas in the domain, the contribution of sources attributed by the models within the cities and outside did not differ significantly. However, this could be also due to limitations in the emission inventory and to the spatial resolution of the simulations that curbed the identification of hot spots associated with local sources.
In the OPT results, it is possible to observe the split between the different components of the traffic. Diesel exhaust is pointed out as the most important component by FARM and CAMX-PSAT while road dust is put forward by LOTOS-EUROS. In the latter model, however, the OPT traffic source (sum of 7.1, 2, 3, 4 and 5; Table 3) did not match the total traffic obtained by the MDT set. This difference could be due to the specific approach used in this model to estimate the road dust component that may have been set up differently in the OPT and the MDT runs. As mentioned in section 3.2, the low values of CHIMERE are due to the non-apportioned nitrate and ammonium mass. This model attributes the highest share to dust, while CAMX-PSAT and EURAD apportioned very little or no mass to this source.  (Table 2) and SNAP nomenclatures (Table 3)

CTM sensitivity analysis
Three configurations were set up to perform a sensitivity analysis using the TS approach model CAMX-PSAT: the base case and two additional runs. The first sensitivity run (VD) included computations of the vertical diffusion coefficients used in the CMAQ scheme (Byun, 1999) which are different from the YSU scheme (Hong et al., 2006), adopted in the base case. The second sensitivity run (DSR) was performed switching off the Lens inner domain (7 km grid step), thus running CAMX-PSAT only over the EU wide domain at 20 km grid step. The goal of the first run was to assess the how vertical diffusion may impact on the ability of the model to quantify different sources (e.g. local sources versus long range transport) while the second run focused on the influence of horizontal resolution on the SCIEs of local sources.
The analysis was performed at three receptor sites: Lens, Paris, and RUR BKGD2. Lens was the reference site, while Paris and RUR BKGD2 were selected because they were influenced by very different emission strengths. Paris is a megacity, where the influence of local emissions (e. g. transport and domestic heating) are higher than in the rural site which is mainly subject to the influence of agriculture and secondary pollution. The results presented here concern only the OPT set. The VD run was performed only for the winter period because it is characterised by a stronger influence of vertical diffusion processes on pollutant concentrations.
The increased model grid cell dimension in an area close to localised primary emissions (traffic) like the Paris site caused a reduction in the concentrations of primary pollutants associated with that source (Fig. 9, left). A PM 10 concentration decrease (-4 μg/m 3 ) for DSR matched a decrease of -32% in elemental carbon (EC), -38% in primary organic aerosol (POA), and -29% in other primary anthropogenic aerosol in the PM 10 fraction (OPA-10) compared to the base case.
When comparing the performances of CAMX-PSAT using two different grid steps (with all models in the reference), it was also observed that the contribution of traffic was underestimated when using a lower spatial resolution (Fig. 9, right). No significant changes were observed in the other tested sources (industry, energy production, biomass burning and agriculture).
CAMX-PSAT proved to be less sensitive to the variation of vertical diffusion coefficients. The CMAQ algorithm used in the VD run gives rise to stronger vertical diffusivity with respect to YSU scheme used in the base case, thus inducing a decrease in ground level concentrations. In the VD run, the most relevant variations took place in the primary species at Paris. The concentration reductions were, however, modest: PM 10 -0.5 μg/m 3 , OPA-10 -3%, EC -7%, and POA -4% (Fig. 9, left).

CTMs vs RMs
The comparison between the two families of models (RMs and CTMs) was accomplished only for the reference site of Lens. For the interpretation of the results, it was necessary to consider that the estimation for RMs refers to the specific site where the dataset of measured PM 10 was collected, while the CTMs provide an average estimation for the grid cell, roughly corresponding to a 6-7 km grid step, containing the monitoring site. According to our understanding, the representativeness of the site (background) is coherent with the dimension of the grid cell. It is also worth mentioning that RMs output time resolution is that of the samples (daily in our case with samples collected every third day), while CTMs generate results with an hourly time resolution, which were averaged to match the RM data.
Another aspect to consider when comparing RMs and CTMs is the different level of detail about the chemical composition of the PM managed in the two families of models. While RMs are based on a relatively detailed information about the chemical composition (in general, 20 or more species derived from chemical analyses), there is a limited number of chemical families in the CTMs (typically 6 or 7), depending also on the completeness of the emission inventories.
Finally, when matching the two series of sources, it should be considered that most of the inorganic ions, ammonium, sulphate and nitrate are attributed by RMs to one (SIA) or two secondary aerosol categories (ammonium nitrate and ammonium sulphate), while the CTMs allocate these chemical species to the corresponding precursor sources. To account for the different way in which these inorganic ions are handled by the two families of models the comparison between them was first accomplished subtracting these inorganic ions from the CTMs categories and allocating them in a fictitious CTM category named SIA. This correction was discarded for two main reasons. Firstly, despite RMs allocate most of the ammonium, sulphate and nitrate to the SIA category or subcategories, these ions are also present in many other sources. Secondly, when subtracting these ions from the CTM original sources, the contributions of a number of them fell down to values near zero or even negative in some BF/ERI results, as a consequence of the nonlinearity problem (see section 1; Table 1). For instance, when applying this correction to the CTM category energy industry, its SCIEs decreased on average by 97% and 100% in the MDT and OPT sets, respectively. Other heavily altered categories were ship (-89%, -93%), exhaust (-87%), agriculture (-79%, -77%) and industry (-58%, -52%). To avoid introducing distortions, the comparison between RM and CTM was then performed on the SCIEs as reported by participants and the different handling of secondary inorganic ions by the two families of models considered in the interpretation of the results.
In Fig. 10 the average source contributions estimated by RMs and CTMs using both TS (CAMX-PSAT, LOTOS-EUROS) and BF/ERI (FARM, EURAD) approaches for the MDT and OPT sets of sources are compared. To support the interpretation of results, in Fig. 10 are also plotted sources for which the comparison was not possible: agriculture (only CTMs), primary organic (only RMs) and secondary inorganic aerosol (ammonium nitrate and ammonium sulphate, RMs).
The primary sources with the highest SCIEs in RMs are: exhaust, soil (dust), ship, road dust and biomass burning. The coefficient of variation for the RMs varied between 16% and 77% with the exception of industry where it reached 107%. This latter source has been identified as the most critical one in the evaluation of RMs due to the dispersion of the reported SCIEs (see section 3.1.1) Traffic, agriculture, industry and biomass burning are the most important sources in the CTM MDT set while marine, agriculture, exhaust, biomass burning and industry show the highest PM shares in the CTM OPT set.
The CTM SCIEs are in the majority of cases lower than the RM averages for the corresponding source (Fig. 10). Nevertheless, there are cases where CTMs are comparable to or even higher than RMs. In the MDT set the SCIE of traffic reported with EURAD was 40% above the one of the RM average while the other models were 40% to 60% below such reference. Similarly, in industry and biomass burning there was one CTM result reporting SCIEs within 10% of the RMs average (CAMX-PSAT and FARM, respectively) while the others reported much lower values. In the OPT set, the marine source is the one where the CTM SCIEs are the closest to the RM average. FARM and CAMX-PSAT were less than 15% from the RM average, likely due to their parameterisation based on observations. On the contrary, LOTOS-EUROS presented values 70% higher than RMs, probably due to a parameterisation based on observations in areas where the contribution of this source is different than in Lens.
An explanation for the high levels of EURAD compared to the other CTMs and RMs in traffic PM may be the overestimation of EC and organic aerosol in the base case. However, another possible explanation is associated with the used BF/ERI approach. As explained in section 1, the non-linear behaviour of sources leads to inconsistencies between the base case PM mass and the sum of the masses of all the sources. Such inconsistencies increase with the percentage of emission variation used to compute the impact of the sources (Thunis et al., 2018). In the BF/ERI results reported for this study, the impact of the single sources was normalised to reconcile their sum with the total PM 10 mass. However, this method does not correct the relative influence of the sources because they are affected by non-linearity to different extents. The high Fig. 9. Variation in concentrations of primary chemical species (left) between the base case (BC) and the sensitivity runs VD (vertical diffusion) and DSR (decreased spatial resolution). The impact of the spatial resolution on the model performance is also shown (right). Green background: acceptability area. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) level of emission variation (100%) and the limited effectiveness of the normalisation may explain the high traffic SCIEs of EURAD compared with the other models (also observed in the 99 OTH category). The same could be argued for the biomass burning SCIEs of FARM. In this model, however, the non-linearity is expected to be lower because the emission variation was only 20%. In addition, its average SCIE is the only one comparable with those of RMs. The higher TS shares of agriculture with respect to those of BF/ERI models may be partially explained by the normalisation that did not correct the effects of non-linearity and by the overestimation of NH 4 by CAMX-PSAT. In addition, the TS models have allocated higher mass of OC and in one case (LOTOS-EUROS) of OPA because the dust resuspension deriving from the agricultural activities was allocated to this source (section 3.2.2).
Although most of the main SIA mass is not allocated by RMs to their sources, their SCIEs are higher than those attributed by CTMs even for sources where these ions have a dominant contribution, such as energy production, shipping and industry. A possible explanation is that the CTM underestimation of the gravimetric mass overcompensates the fact that they allocate also the secondary inorganic fraction to these sources. This hypothesis is coherent with the outcome of the MPE suggesting that the PM mass underestimation by CTMs is partly due to problems in reconstructing the particulate organic aerosol (section 3.2.1).
The plots in Fig. 10 provide a preliminary understanding about the bias between RMs and CTMs in the different sources. However, to establish whether two SCIEs are comparable or not, the plots in Fig. 10 are not enough and distance indicators with acceptability criteria are required. To that end, the comparison between RMs and CTMs was performed using the methodology described in sections 2 and A.1 of the Appendix. The performance tests were performed setting the CTM sources as candidates and the SCIE of the corresponding RM source category as reference value. In Fig. S8 of the supplementary material the z-score tests for the MDT and OPT sets are displayed. Even though all the CTMs tend to underestimate the SCIEs when compared with the RM values, the majority of the candidate sources fall in the area of acceptance.
In Fig. S9 of the supplementary material the target plots split by source set (MDT and OPT) and by CTM approach (TS and BF/ERI) are shown. The dispersion of the sources in the target plot is relatively uniform between the two sets of sources (for those present in both of them) and between TS and BF/ERI. This is explained by the fact that the differences between CTMs and RMs are too large compared to those between CTMs. Therefore, in this test where the acceptability criteria are dependent on the reference SCIE (RMs in this case) the differences between CTMs are less evident. Another aspect to take into consideration is the variability between the CTMs within each approach as shown in Fig. 10.
In the z-score tests (both MDT and OPT sets), 83% of the candidates rank in the acceptability area while in the RMSE u test only 34% and 25%  (Table 2) and SNAP (Table 3) nomenclatures. The error bars indicate the standard deviation of the RM SCIEs.
of the candidates of the MDT and OPT sets, respectively, pass the test. To summarise the main findings of the performance tests, a synthesis of the comparison between RMs and CTMs for both MDT and OPT sets is presented in Fig. 11.
From the z-score plot, three groups of sources can be distinguished (Fig. 11, left). The first group includes the sources for which the CTM SCIEs are comparable with those of RM (marine and industry) or show little underestimation (traffic). Although their CTM SCIEs are lower than those of RMs, the second group includes sources whose differences fall within the acceptability range of this test (exhaust, energy, ship and biomass burning). The third group includes sources with part of the SCIEs outside the acceptability area (road dust, dust). These conclusions are confirmed and complemented by the RMSE u test, where industry and traffic (first group) fall completely inside the acceptability area (Fig. 11,  right). Despite no observed bias, marine (first group) falls outside the acceptability area due to the lower correlation and amplitude of the time series with respect to the reference. Exhaust and road dust in this test show a lower bias compared to the z-score test, likely due to the effect of the relatively high RMs dispersion (the RMSE u scores are normalised by the uncertainty of the reference). The remaining sources present, in addition to poor correlation with the RM time series, different extents of negative biases as already pointed out in the z-score test. In addition, the time series of road dust, dust, and biomass burning reported by CTMs also present differences of amplitude with respect to the RM.
The good performances of the industry and exhaust sources could be partially due to the relatively high uncertainty of the RM SCIEs for these categories. The poor performance of marine in this test is indicative that, despite average SCIEs being quite similar in RMs and CTMs, their time trends are poorly correlated and have different amplitudes. In this exercise, dust was handled differently by every CTM model (section 2.2.2) because not represented in the emission inventory. This is likely the cause of the sizeable bias component between RMs and CTMs. A possible explanation for the observed bias in road dust could be an underestimation in the input emission inventory. Moreover, in RMs, this source incorporates all the resuspended PM components, including dust deposited on the road and abrasion emissions (pavement, tyre and brake wear) while the CTM source (75 RTW) represents only the latter. The cause of the poor RMSE u values in ship and power plants is likely associated with the strong bias already evidenced between the two families of models in the previous tests (Figs. 10 and 11 left). In the biomass burning source, a problem of amplitude, in addition to a moderate bias has been identified. It is well known that this source undergoes considerable seasonal variations and episodes take place during the cold season. The result of the RMSE u test points out that the two families of models reproduce differently both the extent and timing of such variability.

Influence of the site on the model SA performance
A summary of the SA performance tests for all the sites is given in Fig. 12. In this plot, CTMs are compared with their ensemble reference for each site. At the Lens site, the performance of RMs as well as those of CTMs compared with RMs is also reported (circles).
In general, the variation in the model performance between sites is less than 10% and 15% for the z-scores and RMSE u , respectively. Such homogeneity in the SA model performance suggests that the geographical patterns influencing the allocation of sources depend more on the variations in the input data (emission inventory and meteorological fields) than on the differences between models. The results obtained with the OPT set of sources are always better than those of the MDT set suggesting that the models are more comparable when accomplishing a more detailed apportionment. The overall performance of the RMs at the site of Lens is comparable with the one of the CTMs with the OPT set of sources. As expected, differences are more evident when comparing the two families of models (RMs and CTMs, blue empty circles) than when comparing models within each family, which is reflected by the lower number of successful candidates in the RMSE u tests (Fig. 12 bottom).

Key findings of the intercomparison and discussion
The RMs applications showed a convergence in the use of the EPA-PMF5 tool, in part due to its free availability and user-friendliness and in part due to the good performances of EPA-PMF tools in past exercises, which led to more comparable results with respect to previous intercomparisons (Belis et al., 2015b).
It appears that the industry source in RMs needs a better definition because it encompasses a wide range of different local emissions with different chemical fingerprints that, placed under a common category, led to considerable variability between results. More specific allocation within this category (e.g. in subcategories), by taking more advantage of source profiles repositories for instance, would lead to a better Fig. 11. z-scores (left) and target plots (right) for the comparison between RMs and CTMs at Lens. In the target plot, the sources are represented schematically using the boundaries of their scores. Green background: acceptability area. The sources are encoded according to both the SPECIEUROPE (Table 2) and SNAP (Table 3) nomenclatures. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) identification and consequently to a more accurate quantification of the contributions. In this regard, the characterisation of new markers or chemical fingerprint remains necessary to better take into account the specificity of a wider range of industrial sources.
In this exercise, the availability of an extended set of organic markers, supported the allocation of primary biogenic organic aerosols, a category for which more scientific evidence is necessary in order to be included in the list of the sources that can be subtracted from the total PM for the calculation of the number of exceedances in the framework of the Air Quality Legislation (Directive 2008/50/EC; European Commission -, 2011).
The high number of RM groups involved in the study made it possible to investigate the influence of their experience on the SA performance. RM practitioners declaring to have conducted more than 10 studies (24%) always showed good performances while the less experienced ones presented more variable scores.
Thanks to the set-up adopted in the CTM performance tests it was possible to quantify with real-world data the differences between the CTM TS and BF/ERI approaches in the allocation of SIA under non-linear situations, confirming previous estimations made with theoretical examples . The most emblematic source from this point of view is agriculture, whose ammonia emissions contribute to the formation of the most important components of the SIA, ammonium nitrate and ammonium sulphate. Considerable differences in the total mass attributed to this source and relative CTM chemical profiles were observed between BF/ERI and TS approaches. Similarly, differences were also identified in the performance tests (RMSE u ) in other SIA precursor-emitting sources such as road traffic, power plants and to a lesser extent biomass burning.
The abovementioned differences between TS and BF/ERI were evident in the CTM performance tests when analysing the SCIE daily time averages while they were not detected in test on the overall SCIE averages. A possible explanation is that the non-linearity that gives rise to discrepancies between models depends on the considered time window (i.e. daily for z-scores; seasonal, annual for RMSE u ).
The specific emission inventory, including details on the type of fuel, and the comparability table for the sources prepared for this exercise, made it possible to quantitatively assess the differences between CTMs and RMs SA results. However, it should be recalled that the two types of models use different source definitions and, therefore, only a subset of sources was selected for comparison.
The comparison of CTM source contributions with the measured mass and the RM SCIEs highlights a generalised difficulty of CTMs in apportioning all the PM 10 mass. This problem has been associated with the underestimation of this pollutant concentration in the base case model results, especially during high pollution episodes, due to the poor organic PM fraction reconstruction and to problems in reproducing the dust PM fraction.
Since traffic (including both exhaust and resuspension components) and industry passed both performance tests, they were the most comparable sources between CTMs and RMs. Although exhaust (only in OPT) also passed these tests, the bias in the z-score was not negligible. The marine source obtained good z-scores (little bias) but showed differences in the amplitude of the time series between the two families of models in the target plot. The good comparability between RMs and CTMs for industry may look odd when considering the high variability of the RM SCIEs for this source. However, such comparability is clear in both (a): the z-score tests, where the RM uncertainty is not considered, and (b) the RMSE u tests, where the uncertainties of the RM SCIE are accounted for and lead to scores very close to the centre of the target plot. A possible explanation is that despite the considerable dispersion of the RM SCIEs for this source, their average is not biased. On the contrary, the most critical sources in the comparison between CTMs and RMs are dust and road dust, likely due to the different way in which the former was reconstructed by the different models and a possible underestimation of the latter in the emission inventory because only the PM deriving from abrasion is included while the re-suspended dust is not considered. These sources are only partially represented in the emission inventories because their quantification is not formally required by the official reporting schemes in the EU. For example, road dust resuspension is considered a re-emission and not a primary PM emission source (Denier Van Der Gon et al., 2018). Significant underestimation, albeit within the tolerance of the test, is observed for shipping, power plants, biomass burning and to a lesser extent, traffic exhaust.
The differences between TS and BF/ERI were not evident when comparing CTMs and RMs. The reason is that the objective of the performance tests is to compute the distance between the reference and every candidate and not those between candidates. Consequently, the acceptability criteria are set on the basis of the reference SCIE (RMs in this case). As shown in Fig. 10, most of the RM-CTM differences are much higher than those between TS and BF/ERI and, therefore, the latter are less visible in the CTMs-RMs test. It follows that the Fig. 12. Synthesis of the intercomparison CTM performance tests (expressed as percentage of successful candidates) for all the studied sites (solid markers). The comparison between RMs and between CTMs and RMs, available only for the city of Lens, are also displayed (empty circles).
performance tests among CTMs (section 3.2.3) are the most appropriate to assess the differences between CTM approaches.
According to the variation in the CTM share of sources it was possible to identify four groups of sites: megacities, medium-sized cities, coastal sites and background sites. Although differences among these groups were perceptible, the medium-sized cities and background sites presented a relatively homogeneous share of sources. Such similarity suggests that the contribution of sources attributed by CTM within the cities and between them do not differ significantly. This could be due either to the high density of urban areas in the studied domain or to the spatial resolution of the simulations that limited the analysis of local hot spots. The abovementioned homogeneity is also reflected in the CTM SA model performance suggesting that the geographical patterns influencing the allocation of sources depend more on the variations in the input data (emission inventory and meteorological fields) than on the differences between models.
The sensitivity analysis with CAMX model suggests that adopting spatial resolutions lower than the one used in this study could lead to the concentrations of PM 10 and associated primary pollutants (e.g. traffic) being underestimated by 20-30%, particularly in urban areas. This would affect the ability of the models to apportion the mass to this kind of sources properly. More work is necessary to better understand the implications of this results.
Despite the CHIMERE base case was comparable to the other models, the methodological choice adopted in this particular application (not apportioning nitrate and ammonia to their sources) impacted on the comparability with the other models' results. As a consequence, the scores of sources for which these chemical compounds represent a high share of the mass (power plants, shipping, agriculture) were identified as outliers in the performance tests.

Conclusions
The high quality of the input data for both RMs and CTMs and the considerable number of SA results (49) provided the basis to build up an unprecedented database, with key information to support an extensive analysis of RMs and CTMs methodologies used in Europe for SA applications related to PM 10 SA. The adopted methodology proved to be efficient to compare and characterise the SA performances of models of the same family (RM or CTM), different approaches within the CTM family, and CTMs vs RMs. The differences highlighted in the tests provided evidence to understanding the implications that the use of different types of models have on the SA output contributing to a better interpretation of the SA results. Since this is the first application of this methodology to CTMs and RMs-CTMs, further work is needed to create a record of results, including different models and input datasets, to tune up the tests' acceptability criteria.
Despite a considerable dispersion observed in some sources (e.g. industry), RMs presented comparable results likely due to the use of the same model (EPA-PMF5) in the majority of cases. More effort is needed to better define sources by developing the existing repositories of source profiles to be used as reference in SA analyses using either RMs or CTMs. Differences were measurable between the TS and BF/ERI CTM approaches, in particular for secondary inorganic aerosol deriving from precursors emitted from different sources (energy, traffic, agriculture). In general, the RMs presented higher SCIEs than CTMs likely due to the underestimation of the total PM 10 by the latter, especially during high pollution episodes, which has been partly associated with problems in reconstructing the PM 10 organic fraction. In the case of dust and road dust sources, the CTM-RM comparison highlighted areas for improvement in the emission inventories.
The relevant differences in the comparability between RM and CTM and between TS and BF/ERI approaches SCIEs confirmed that the evaluation of the CTM results should not be limited to the total concentration but also extended to the single emission categories. In this regard, more work is needed to establish how this can be achieved in the common modelling practice.
The quantitative evaluation of the RM and CTM SA model applications using a common methodology achieved in this study is a first step towards the joint application of the two families of models to take advantage of the strengths of both of them.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table A1
Evaluation methodology for source apportionment model results (Belis et al., 2015a)  The reference SCIE values for the performance tests were obtained from the average of the candidates excluding the outliers identified with the similarity tests. The z-score indicator (ISO 13528, 2005) was applied to the average SCIEs over the entire dataset reported by participants. The uncertainty for proficiency test (σ p ) was set to 50% of the reference SCIE by analogy with the model quality objectives for PM 10 annual mean laid down in Directive (2008)/50/EC. The acceptability interval for the z-scores was -1.96 and 3.99 obtained using a synthetic dataset where the SCIEs of the sources were known. A kernel distribution (R package ks v. 1.9.2) was fitted to more than 200 unbiased z-scores after removing outliers and extracting the 0.005 and 0.995 percentiles to define areas with the same probability density as those used in the abovementioned standard (Belis et al., 2015a). Moreover, the SCIE time series for every candidate were evaluated using the root mean square error (RMSE u ) normalised by the uncertainty of the reference (u) at every time step (Jolliff et al., 2009;Thunis et al., 2012). The uncertainty of the reference was estimated as the standard deviation of the SCIE reported by participants. The RMSE u values � 1 are considered indicators of good performance.

A2 Glossary of source apportionment intercomparison terminology used in this study
Candidate: PM 10 source reported in one source apportionment result. Contribution-to-species: mass of each of the PM 10 chemical components attributed to each candidate expressed as percentage of the chemical species total mass. Participant: is a single or group of practitioners who deliver a result in an intercomparison exercise. Reference: is the average of the SCIE of the candidates reported by participants for the same source category. Result: is the output of a source apportionment model including a list of candidates with the estimation of their contribution or impact to the total PM 10 .
Root mean square error uncertainty normalised (RMSE u ): is the RMSE weighted by the uncertainty of the reference and is used to assess the performance of a candidate SCIE time series.
Source profile: is the relative abundance of every PM 10 component of a source expressed as a proportion of the source total mass. It could be measured directly at the source (measured source profile) or be the result of a SA analysis (derived source profile).
Source category: is a set of sources or processes pooled under a single class due to their chemical or temporal affinities. Source contribution or impact estimate (SCIE): is the amount of mass attributed to one candidate in one source apportionment result either for the single sample (time step) or for the overall average value. z-score: is the difference between the candidate and the reference SCIE weighted by the uncertainty for proficiency testing, which is set to 50% of the reference. It is used to evaluate the performance of the candidate SCIE overall average.    Combustion processes aimed at producing heat (heating) for non-industrial activities: commercial and institutional installations, including residential (heating and domestic combustion processes such as fireplaces, stoves, etc.). The macro-sector is subdivided according to the used fuel: 2.1 includes the emissions deriving from combustion of fossil fuels and 2.2 those from the combustion of biomass burning. 3

A3 Most common abbreviations used in the text (except intercomparison methodology)
Combustion in manufacturing industry 3 Industry (combustion) Combustion processes strictly related to industrial activity. Includes all the processes that require locally produced energy through combustion: boilers, furnaces, first melting of metals, production of gypsum, asphalt, cement, etc. 4 Production processes 3 Industry (processes) Industrial emissions originated by all processes for the production of a given good or material different than combustion. Includes all the processes in the iron and steel, mechanics, organic and inorganic chemistry, wood, food production industries, among others. 5 Extraction and distribution of fossil fuels and geothermal energy Extraction and distribution of fossil fuels Land and off-shore emissions deriving from production, distribution, storage of solid, liquid and gaseous fuel. It also includes emissions from geothermal energy extraction processes.
# SNAP nomenclature 1 MACC II nomenclature 2 Description 6 Solvents and other products use Product use Emissions from activities that involve the use of products containing solvents, excepting their production (e.g. painting and degreasing operations including the domestic use of such products). 7 Road transport Road transport Emissions due to cars, light and heavy vehicles, motorcycles and other means of transport on the road, including both emissions due to exhaust and wear from brakes, wheels and the road. The macro-sector is subdivided in five subcategories: 7.1 exhaust gasoline, 7.2 exhaust diesel, 7.3 exhaust natural gas/LPG, 7.4 non-exhaust evaporation, and 7.5 non-exhaust wear of brakes etc. 8 Other mobile sources and machinery Non-road transport and other mobile sources Emissions from rail transportation, inland navigation, military vehicles, maritime traffic, air traffic and mobile non-road internal combustion sources, such as agricultural and forestry vehicles (chainsaws, pruning equipment, etc.), those linked to gardening activities (lawn mowers, etc.) and industrial vehicles (bulldozers, caterpillars, etc.). In this study only the international shipping emissions are considered. 9 Waste treatment and disposal Waste treatment Emissions from waste incineration, spreading, and landfill including waste management related activities such as the treatment of waste water, composting, the production of biogas, the spreading of sludge, etc. 10 Agriculture Agriculture Emissions deriving from all agricultural practices with the exception of the thermal heating groups (included in the macro-sector 3) and the motor vehicles (included in the macro-sector 8). Includes emissions from crops with and without fertilizers and /or pesticides, pesticides, herbicides, incineration of residues carried out on site, emissions due to breeding activities (enteric fermentation, production of organic compounds) and nursery production. 11 Nature and other sources and sinks It includes all the non-anthropic activities that generate emissions (activity of plants, shrubs and grass, lightning, spontaneous emissions of gas, emissions from the soil, volcanoes, natural combustion, etc.) and those activities managed by man that are connected to them (managed forests, planting, repopulation, arson combustion of forests). In this study are considered only the following subcategories: 11.1 natural dust, 11.2 sea salt and 11.3 biogenic secondary organic aerosol.