Sources and geographic origin of particulate matter in urban areas of the Danube macro-region: The cases of Zagreb (Croatia), Budapest (Hungary) and Sofia (Bulgaria)

The contribution of main PM pollution sources and their geographic origin in three urban sites of the Danube macro-region (Zagreb, Budapest and Sofia) were determined by combining receptor and Lagrangian models. The source contribution estimates were obtained with the Positive Matrix Factorization (PMF) receptor model and the results were further examined using local wind data and backward trajectories obtained with FLEXPART. Potential Source Contribution Function (PSCF) analysis was applied to identify the geographical source areas for the PM sources subject to long-range transport. Gas-to-particle transformation processes and primary emissions from biomass burning are the most important contributors to PM in the studied sites followed by re-suspension of soil (crustal material) and traffic. These four sources can be considered typical of the Danube macro-region because they were identified in all the studied locations. Long-range transport was observed of: a) sulphate-enriched aged aerosols, deriving from SO2 emissions in combustion processes in the Balkans and Eastern Europe and b) dust from the Saharan and Karakum deserts. The study highlights that PM pollution in the studied urban areas of the Danube macro-region is the result of both local sources and long-range transport from both EU and no-EU areas.


S3. Reconstruction of OC/EC missing data in PM2.5 of Zagreb
The OC/EC data in PM samples of Zagreb were collected during the sampling period (all year, 2013), but OC/EC data in PM 2.5 were missing due to technical reasons for two and a half months (since 01 January to 14 March 2013, 73 days: ~20% of the PM sampling data). In Zagreb, OC/EC were analysed on 24h samples for both PM 2.5 and in PM 10 , and there were not missing data for PM 10 . To make available the information of the carbonaceous fraction in a large number of PM 2.5 samples for the PMF analysis, we estimated the OC missing concentrations in PM 2.5 from the OC concentration in PM 10 for the same site and same days. To that end we calculated a regression curve between OC data in PM 2.5 and PM 10 samples in Zagreb in the days of the year when OC was simultaneously analysed in both PM fractions (15 March-31 December, 292 days). The parameters of the regression curve between OC(PM 2.5 ) and OC(PM 10 ) (R 2 =0.95; slope = 0.86, intercept = 0.06) were used to estimate daily OC missing concentrations in PM 2.5 for the time window 01 January-14 March 2013 ( Figure S1A). The correlation between EC(PM 2.5 ) and EC(PM 10 ) was weaker (R 2 =0,63) ( Figure S1A). Therefore, EC missing concentrations in PM 2.5 were estimated using the EC obtained from the light absorption coefficient α determined in the same days and same site which was better correlated with EC (R 2 = 0.73; slope = 0.61, intercept = 0.05) ( Figure S1B). Figure S1A. Correlation between OC data in PM2.5 and PM10 samples,and EC data in PM2.5 and PM10 samples of Zagreb (Croatia). Period: 15 March-31 December 2013 (n=292).

S4.1 Factors associated to BB in the PMF analysis of PM 2.5 in BDP
Two factors were associated to biomass burning (BB) source in the PMF analysis of PM2.5 in Budapest (BDP). Both of them are characterized by the presence of specific BB source markers, i.e. LEVO that is a unique BB tracer and other markers for this source such as K, Zn, Cl, OC ( Figure 3A). The two factors identified as BB source differ in their content of nitrate, that reaches 35% of the mass in the factor "Nitrate rich + BB" and is absent in the factor "BB". However the most distinctive feature for these two factors are their time trends. The BB presents a rather typical seasonal trend for this source with higher contributions in winter that gradually decrease to achieve minimum levels in late spring. On the other hand, the Nitrate rich+BB factor is characterized by a distinct episode in the second half of February followed by minor isolated events in March.
In the BDP solution ammonium is almost totally (92%) allocated to "Secondary Aerosol (SEC)" as a counterion for secondary sulfate and to a lesser extent to nitrate. Very little or no ammonium is allocated to the other factors suggesting other counterions such as K + , Na + , Ca 2+ , Mg 2+ . or other cations (we did not analyse all) are the alkaline species that neutralize the nitrate acidity in the considered factor.

Sofia (SOF) PM10 -Source contribution estimation (SCE)  g m -3
Factor/Source fall ( No significant changes occurred between the two factors "BB" and "Nitrate rich + BB" with respect to ratios like OC/LEVO, K/LEVO, and such ratios were comparable to typical ,although highly variable, ratios in BB source, as shown in Table S6. Factor "BB" (this study, BDP)

OC/LEVO and K/LEVO ratios in the factor BB and Nitrate rich + BB in Budapest (BDP) compared with literature data for the BB source profile
The split of biomass burning in two factors is unlikely to be an artifact (artificial split of one source in two factors) because in addition to the model diagnostics (Table S2) is supported by distinct time trends that are also associated with different geographical patterns, as shown by the analysis of backward trajectories (Section 3.4.2 of the paper). Figure S2. Zagreb, conditional probability function (CPF) (all sources) Figure S3. Budapest

S6.1 The use of a binomial distribution to reduce the statistical noise
Various versions of PSCF have been proposed in previous publications (Dimitriou and Kassomenos, 2016;Polissar et al., 2001). Vasconcelos et al. 1996 suggested the use of a binomial distribution in order to test the significance of PSCF and to reduce the statistical noise. In this study PSCF was applied in the following manner: Since the PSCF is computed as a ratio of the counts of selected events (m ij ) to the counts of all events (n ij ), it is likely that relatively small (n ij ), which are often related to sparse trajectory coverage of the more distant grid cells, may result in PSCF with high uncertainty in the apparent high value. First we set a criterion for the residence time value in a cell that has low uncertainty, so as to keep a grid cell in the PSCF analysis. We set this value (p) equal to the average residence time from all cells (n), after we have excluded the cells that have residence time equal to zero. In other words, the number of seconds (residence time, s) in each grid cell is compared to the average of residence times in seconds from all cells with s>0 (n), using a binomial test. The result of the binomial test is the probability of observing the value of s in a cell, while the actual value of the cell is p, and therefore significant. For example, if this probability is lower than 1%, there is very low probability that this cell will actually have a residence time equal to p, therefore it will not be considered a significant cell, and its PSCF result is decreased by a weighting factor. These weighting function limit values were obtained empirically by running the PSCF program many times and applying the trial and error method. Figure S7 shows an example of 5 days PSCF analysis with and without the binomial filter for sources affected by long-range transport in Budapest. When we do not apply the filter for the Nitrate rich source, we observe high PSCF values in the Arctic. For the Biomass Burning source, the very distant sources indicated over Cyprus, Turkey and the Caspian Sea are weighted down to a more reasonable probability (there is no evidence to support a contribution to BB from such a distant location).

S6.2 Altitude of the trajectories.
"There is a limit to the height above a station that can be used when running Lagrangian models, because at night time the Boundary Layer Height (BLH) might be very low, leading to different PM concentrations at the trajectory height from the ones measured at the station. For Sofia, during 2012-2015, ECMWF reanalysis ERA-interim data indicate that in only 5% of the 3hr intervals BLH is lower than 300 m. FLEXPART is particularly suitable to minimize the errors deriving from the height because it works with statistical distributions rather than deterministic values. The model was set to release a high number (20,000) of computational particles (finite air masses) every hour, and thus it covers a broad range of probabilities in atmospheric circulation. FLEXPART simulates very well air mass transport in the Mesoscale (Brioude et al., 2012).
Finally, it is important to bear in mind that PSCF is a tool to provide information on impact of distant sources to our area of interest and not high concentrations induced by local effects. When a long dataset is available the paths of transport during local events are random leading to no statistically significant results that are consequently removed by the binomial filter described above."

S6.3 Multi-site PSCF analysis for the Danube region
A multi site PSCF analysis was applied for the sources that had a significant probability to be transported in the mesoscale. The results indicate that there are two main source areas for the Soil aerosol (North Africa, Caspian Sea region). There is also a source indicated in Asia Minor. The combined Secondary aerosol results indicate that the main source areas are in the European Turkey region, and the St Petersburg region on the North-East.
Combined Soil aerosol PSCF analysis for Sofia, Zagreb at the 90 th percentile, 10 days backward run.
Combined Soil aerosol PSCF analysis for Sofia, Zagreb at the 75 th percentile, 10 days backward run.
Combined Secondary aerosol PSCF analysis for Sofia, Zagreb, and Budapest at the 90 th percentile, 5 days backward run.
Combined Secondary aerosol PSCF analysis for Sofia, Zagreb, and Budapest at the 75 th percentile, 5 days backward run. Figure S8. Multi-site PSCF analysis for the Danube region based on results from Sofia, Zagreb and Budapest (Secondary aerosol) and Sofia, Zagreb (Soil). 75 th and 90 th percentile.
In order to apply PSCF, for each cell we calculate the ratio PSCF ij = m ij /n ij . m ij is the number of seconds in a cell corresponding to the measurements that have concentration higher than the 90th or 75th percentile of the estimated source concentration, and n ij is the total number of seconds (residence time) in a cell for all measurements.
PSCF ij is the measure of probability of a 1 º x1 º grid cell to contribute to the injected mass which is from there on transported and is related to the mass concentration measured at the receptor sites considered.
The methodology used in order to apply multi-site PSCF was similar to the one reported by (Han et al., 2007). The results based on the 90 th and 75 th percentile for each city were used in the following equation:  Figure S9. PSCF analysis for the Danube region at the 75 th percentile.

S6.4 Influence of a different percentile (75 th ) threshold for the PSCF analysis in the Danube region
The methodology followed was from (Uria-Tellaetxe et al., 2014), where the 75th and 90th percentiles are used. Also, at the Openair project, the 90th percentile is the default value for PSCF analysis (Carslaw and Ropkins, 2012). We observe from the comparison of the above Figure S9 and Figure 6 of the manuscript, that on the 90th percentile the result is much more focused on source areas. For the Budapest Nitrate rich aerosol (aged Biomass Burning) the area between Moscow and Ukraine (further from the ZAG Secondary aerosol at the 75 th percentile (5 days) ZAG Soil at the 75 th percentile (10 days) BDP Secondary aerosol at the 75 th percentile (5 days) BDP Biomass burning at the 75 th percentile (5 days) BDP Nitrate rich and BB at the 75 th percentile (5 days) SOF Secondary aerosol at the 75 th percentile (5 days) SOF Soil at the 75 th percentile (10 days) probability measurement site) can be identified for the 90th percentile, while for the 75th percentile areas from smaller distances seem to contribute. For the Budapest fresh Biomass Burning aerosol, at the 90th percentile we observe that source areas are those that are close to Budapest (Romania, Ukraine), while at the 75th percentile, very distant areas seem to contribute to fresh aerosol (See also Figure  S5 in Supplemental). Overall, the result for the 75th percentile is that it is diffused in comparison to that of the 90th percentile. Our interpretation is that a lower threshold (75th Percentile in this case) to encompass a higher number of samples leads to the inclusion in the PSCF analysis of many samples where the contribution of the studied source is not clearly dominant. The results, therefore, represent a wider range of situations leading to higher spread of the source areas that not necessarily contribute to a better representation of the pollution origin. According to this test, the 90th percentile suggested in the literature seems to be appropriate for the purposes of our study.