A decade of monitoring micropollutants in urban wet-weather flows: What did we learn?

driven analysis demonstrates how future wet-weather monitoring programs will be more effective if the con- sequences of high variability inherent in urban wet-weather discharges are considered.

driven analysis demonstrates how future wet-weather monitoring programs will be more effective if the consequences of high variability inherent in urban wet-weather discharges are considered.

Introduction
Large efforts have been made in the last decade to shed light on the occurrence of micropollutants in wet-weather flows from urban areas (i. e. discharges from combined sewer overflows -CSO and separate stormwater outlets -SWO). These monitoring activities are driven by the increasing awareness of the threat posed by micropollutants (also defined as chemicals of emerging concern, chemicals of mutual concern, priority substances, xenobiotics, etc.) to urban and peri-urban water bodies (Musolff et al., 2010;Schwarzenbach et al., 2006). Monitoring of micropollutants in wet-weather flows has multiple aims: (i) to identify the sources of micropollutants across urban areas, linking release processes to e.g. land use, climate and building materials (Burant et al., 2018;Gooré Bi et al., 2015;Kafi et al., 2008;Mutzner et al., 2020;Rippy et al., 2017;Wicke et al., 2021), (ii) to assess their threat to the water environment (Masoner et al., 2019;Nickel et al., 2021;Peter et al., 2020;Petrie, 2021) (iii) to quantify the loads of micropollutants discharged from urban areas at the catchment/city/river basin scale (Gasperi et al., 2012;Launay et al., 2016;Nickel et al., 2021;Wittmer et al., 2010;Zgheib et al., 2012), (iv) to assess the performance of pollution control strategies by e.g. quantifying the removal performance of stormwater control measures (Fairbairn et al., 2018;Sébastian et al., 2015;Zhang et al., 2014), and (v) to issue discharge permits by local authorities (Jensen et al., 2020;e.g. Miljøstyrelsen, 2017, Miljøstyrelsen, 2013. Micropollutant concentrations vary across urban areas (Gasperi et al., 2014;Mutzner et al., 2020;Rippy et al., 2017;Wicke et al., 2021). Moreover, micropollutants show a high variability over time due to the release and transport processes, mainly driven by rainfall (Gooré Bi et al., 2015). Hence, a great number of samples needs to be collected before knowledge of the pollution levels from a specific site is established with confidence (Bertrand-Krajewski et al., 2002;Burton and Pitt, 2001;McCarthy et al., 2018). Furthermore, uncertainty in sampling and chemical analysis affects the results. However, the high costs and resource requirements create barriers to extensive data collection. These practical limitations are reflected by the fact that single monitoring studies report a relatively small number of data, compared to other "traditional" pollutants, such as total suspended solids, phosphorus, or nitrogen (Lee et al., 2007;McCarthy et al., 2018;Métadier and Bertrand-Krajewski, 2012). Studies on micropollutants typically sample between 1 and 12 events (based on Spahr et al., 2020 and Table 1). Single published datasets, analyzed individually, do not allow broader conclusions on micropollutant occurrences in urban areas and the planning of optimal monitoring strategies targeting multiple sites. There are examples of review studies that collected and compiled measurements from different studies (Brudler et al., 2019;Göbel et al., 2007;Petrie, 2021;Spahr et al., 2020) but little focus has been put into an overarching statistical analysis of the datasets and its implications for future monitoring efforts.
Our study aims to gain new insights from past monitoring and improve the foundation for planning future monitoring campaigns targeting micropollutants in urban wet-weather flows. Our analysis builds upon a data-driven analysis of monitoring data collected from multiple sites over the last decade. Specifically, we (i) investigated the occurrence of micropollutants found in wet-weather flows; (ii) assessed the potential risk that they might pose to the water environment; and (iii) analyzed the variability of the collected data both across discharge events and sites to estimate an indicative number of events and sites needed for reliable information on micropollutant concentrations for individual sites as well as several urban catchments. Finally, we summarized these outcomes to (iv) provide recommendations for designing future monitoring programs that maximize the usefulness of the collected information. Overall, this study will create the basis for a more effective collection of data on micropollutants in wet-weather flows, enabling more effective management in tackling this environmental challenge.

Data on micropollutants in urban wet-weather flows
For our data analysis, we considered data from both urban wetweather discharge types: combined sewer overflows (CSO) from combined sewer systems and stormwater outlets (SWO), from separate sewer systems. We selected only datasets collected after 2010 and those for which we were able to get access to the raw data. The data and code are accessible via https://doi.org/10.5281/zenodo.6808401.
We choose only datasets collected with composite sampling, meaning the collection of several sub-samples per event pooled together into a composite sample, and reported as Event Mean Concentration (EMCthe average concentration in a rainfall-runoff event). The analyzed EMC data were collected with time-, flow-or volume-proportional sampling (Table 1). In addition, we considered data from passive sampling designed for sampling of wet-weather events (Mutzner et al., 2019). Pooling data obtained from different sampling strategies limits comparability among studies and thus causes additional uncertainty (e.g. (McCarthy and Harmel, 2014)). However, given the few available datasets, we assume that the advantage of learnings from a collected dataset outweighs the disadvantages. Our study did not consider datasets with limited data representatively (e.g. grab sampling) or unclear sampling strategies.
The collected data originate from 77 sites (36 CSO, 41 SWO, Table 1) and include 683 events and 610 total micropollutants. While SWO data from three continents were (Europe, Australia, USA), the data for CSO are only from Europe. Some of the collected datasets also reported other water quality indicators (nutrients, pathogens), which were not included in the scope of this study. The R package webchem (Szöcs et al., 2020) was used to find unique CAS (Chemical Abstracts Service) numbers for the micropollutants, as different names and CAS numbers were used across published studies. Nevertheless, there might be some overlap in the full list of 610 micropollutants. The full dataset was then cleaned by considering micropollutants at least found once higher than the limit of quantification (LOQ) and by a final manual check, leading to 297 micropollutants (including heavy metals). A total of 34,266 EMC observations (including those below LOQ) were considered in the analysis ( Fig. 1 and SI Fig. A  1). The number of micropollutants found varies considerably among monitored sites and events, ranging from 13 to 214 micropollutants and 1 to 35 events per site (SI Fig. A 2).

Sample preparation and chemical analysis
The studies listed in Table 1 are based on different sample preparation methods and chemical analyses. The detailed chemical analysis and preparation procedures can be found in the original publications (Table 1). For micropollutant concentrations that were measured as filtered (dissolved) and not-filtered (total), the total was taken for further calculations (heavy metals, hydrophobic chemicals). For compounds only measured in dissolved (not filtered) form, the dissolved concentration was taken. Accordingly, the Environmental Quality Standard (EQS) was also considered for the total concentration -see details in Section 2.4.
The differences in sample preparation and chemical analysis can influence the comparison among sites, which can falsely suggest a different emission situation due to differences in LOQ. LOQs vary among studies and measured micropollutant concentrations range from below ng/L to µg/L. A comparison of LOQ with EQS shows that for seven out of the nine considered studies LOQ was higher than EQS for selected micropollutants. For 40 micropollutants EQS was smaller than the corresponding LOQ for selected sites (for min. 1 -max. 30 sites, SI Figs. A 3 and A 4). In these cases, low concentrations close to EQS might not be detected. This highlights the challenge faced by existing chemical analysis methodologies in detecting relevant micropollutants in urban wet-weather flows.

Data pre-processing: summary statistics including values below the limit of quantification
The compiled datasets contain a large number of observations below LOQ, with 51.4% left-censored observations (see also Section 3.1). The data were pre-processed to estimate summary statistics taking into account left-censored observations (< LOQ). Micropollutants that were not found (below detection limit) in any samples were considered as "not applicable" (NA) in the data pre-processing when estimating summary statistics. When at least 3 EMC observations were available, and less than 80% of the data were left-censored, summary statistics were computed for all sites combined as well as for individual sites (Fig. 1). Summary statistics were computed by using regression on order statistics (ROS -as detailed in (Helsel, 2011)) based on the R package NADA (Lee, 2020): First, the data were log-transformed and then Weibull-type plotting positions of censored and uncensored observations were calculated. Linear regression was fitted to the uncensored observations. This regression model was then used to estimate the concentration of the observations below LOQ based on their normal quantiles. ROS was done for each site and micropollutant separately. A visual check of the results confirmed a good fit for the linear regression for most sites, with R 2 > 0.5 and a significance of p<0.05 for 85% of all micropollutants per site. For each micropollutant and site, the ROS summary statistics mean was used as Site Mean Concentration SMC site . The calculated SMC site is thus the mean of the measured concentrations and it is not weighted with the volume of the event, as often done in other studies (e.g. McCarthy et al., 2018). The comparison of mean to median per micropollutant and site resulted in an 80%-interquantile range of 0.96 to 2.3, confirming that for most micropollutants the data are skewed and a lognormal distribution is more suitable than a normal distribution (SI Fig. A 5). Overall, there were sufficient data for 223 micropollutants (30,659 EMC observations), which could be used to calculate summary statistics across all sites, while for 186 micropollutants, site-specific summary statistics SMC site were possible (Fig. 1). Due to few observations and high censoring, no SMC site summary statistics could be computed for 72% of the SMC site observations. These measurements were only used to assess the contaminant occurrence at a site (Section 3.1), without concentration estimates. All calculations and graphics were done in R, 4.0.3 (R Core Team, 2020) using tidyverse (incl. ggplot2) and NADA (Lee, 2020;Wickham, 2016;Wickham et al., 2019).

Micropollutants occurrences and risk quotients
Micropollutants were classified into five chemical classes with diverse urban sources ranging from household wastewater to street runoff (Table 2) We assessed which micropollutants i) are often found if searched for per site (occurrence) and ii) are found in concentrations above exiting EQS (risk quotient), distinguishing between CSO and SWO.
We calculated the occurrence by looking at the number of sites where a substance was found at least once > LOQ: we then calculated occurrence as the 'number of detections at least once > LOQ at a site' and normalized it by the 'number of sites where the substance was searched' (Section 2.2 for discussion on LOQ variability).
The environmental risk was assessed by calculating a risk quotient (RQ) based on a precautionary 90%-percentile of the concentration over all sites (estimated from the ROS summary statistics), divided by chronic EQS for surface freshwaters. If no chronic EQS was available, acute EQS was used. Critical concentrations in receiving waters are assumed to potentially occur if RQ ≥ 1, i.e. dilution in receiving waters is needed to avoid environmental impacts. The use of chronic EQS ensures a more conservative assessment, as chronic EQS are lower than acute EQS. Also, even though acute EQS would be more appropriate to judge the risk of a single discharge event, their definition (definition of event, sampling duration, etc.) remains unclear. The applied EQS originate from the EU Directive (annual average EQS), the US Environmental Protection Agency (US EPA=, and from national water quality standards from Denmark and Switzerland (EU Directive 2013/39, 2013Miljø-, 2017;The Swiss Federal Council, 1998;US EPA, 2022). In addition, EQS recommended by the Swiss Ecotox Center were used (Swiss Centre Ecotox, 2021). If more than one EQS value was available, the lowest value was taken. EQS could be defined for 117 of the 297 micropollutants (SI Table A 2). For micropollutants without EQS no risk quotient was calculated and these micropollutants were not considered for the risk ranking (Section 3.2).

Number of events needed for site mean concentration estimate
The number of events needed to reliably estimate a Site Mean Concentration (SMC site ) depends on the variability at the specific site, i.e. the inter-event variability of measured EMCs (EMC site ) (Bertrand-Krajewski et al., 2002;Burton and Pitt, 2001;McCarthy et al., 2018). This variability can be described by the coefficient of variation (CV EMC,site ), calculated as the standard deviation of EMC site divided by the mean of EMC site (i.e. SMC site , Fig. 2A). Differences among coefficient of variations per wet-weather discharge type and chemical class were evaluated with one-way ANOVA analysis, analyzed for homoscedasticity, and a Tukey's Honestly Significant Difference test for pairwise comparisons was done in R (R Core Team, 2020).
We evaluated the effect of sampling a limited number of events by drawing subsets of events from the site-specific EMC site lognormal distributions for a given micropollutant (obtained from the ROS summary statistics -Section 2.3). The following bootstrapping procedure was applied for each micropollutant for the second resampling approach (Fig. 2B): 1. A population of 10,000 EMC site values for a given micropollutant and site j was generated from lognormal distributions. These lognormal distributed values were calculated with the R package EnvStats (Millard, 2013), using the mean of EMC site (SMC site ) and the coefficient of variation of EMC site (CV EMC,site ) from the ROS analysis. 2. A subset of n random EMC site,n values were drawn from the population (step 1) without replacement, varying the number of drawn events n from 1 to 300. 3. For each number of events n subset, the SMC j,n per site j was calculated and then compared to the "true" SMC site . 4.
Step 2) and 3) were repeated 1,000 times, enabling the estimation of the 95% confidence interval of the ratio SMC j,n /SMC site (step 3) for different number of events n (grey points in Fig. 2B). 5. The error in the SMC j,n estimation per number of drawn events was calculated by the error bandwidth of the 95%-confidence interval of SMC j,n /SMC site (red line in Fig. 2B). 6. We identified the number of events needed to estimate SMC site based on an error band width of 0.5 and 1 (green arrows in Fig. 2B).

Number of sites needed for micropollutants emission estimate among sites
Urban catchments differ in aspects such as size, sewer system, topography, land use, and substance use. Therefore, there is an inherent variability among catchments and wet-weather discharge sites, which challenges the estimation of micropollutant emissions from wet-weather Table 2 Classification of micropollutants into five chemical classes, urban source (wastewater or surface runoff) and exemplary list of micropollutants (full list with at least one value > LOQ in SI Table A  flows at the city/river catchment scale, where a large number of discharge sites is often present. Thus, an indication of the number of sites needed to account for this inter-site variability will help urban water managers to decide on future monitoring programs and allow modelbased predictions. The estimation of a "typical" SMC range of a specific micropollutant in CSO and SWO requires a minimum number of sites to be monitored. For this analysis, the coefficient of variation CV SMC was calculated based on the mean (SMC tot ) and standard deviation of available SMC site values, Fig. 2A). Data from SWO and CSO were analyzed separately and micropollutants with less than 3 sites with summary statistics were removed. This resulted in a total of 160 SMC tot (47 CSO and 113 SWO). The exploration of SMC site indicated that the root-squared transformation (to avoid negative values of SMC site ) of SMC site per micropollutant follows a normal distribution. For the resampling of the sites we followed the following procedure: 1. For each micropollutant per SWO and CSO site a population of 10,000 random normal-distributed root-squared transformed SMC site was generated 2. A subset of j random SMC site values was drawn from the population without replacement, varying the number of drawn sites j from 1 to 200. 3. For each j-th subset, the SMC tot,j value was calculated and then compared to SMC tot 4.
Step 2) and 3) were repeated 1,000 times, enabling the estimation of the 95% confidence interval of the ratio SMC tot,j /SMC tot (step 3) for each site j.
5. The error of each j-th SMC tot,j estimation was calculated based on the error band width of the 95%-confidence interval SMC tot,j /SMC tot . 6. We identified the number of sites needed to estimate SMC tot for an error band width of 0.5 and 1.

Micropollutants occurrence in CSO and SWO
The percentage of left-censored values is higher in SWO with 57% censored EMC observations than 29% in CSO (Table 3). This difference in censoring could be explained by the larger number of micropollutants searched in SWO than in CSO (about 24,000 and 10,000 observations, respectively) including micropollutants that are rare in SWO (e.g. PPCPs).
Chemical classes differ in the number of searched micropollutants. Pesticides are the largest group of searched pollutants in SWO and household/industrial in CSO. The percentage of left-censored observations varies from 2% for HM in SWO to 83% for PPCPs in SWO. In general, there is a clear trend that chemical classes that are expected in wastewater (PPCPs, Household/Industrial) have fewer left-censored values in CSO than in SWO. HM are almost always found in both system types (SI Fig. A 1).
Some micropollutants were searched in almost all sites (CSO and SWO), as shown in Fig. 3 (full list in SI B). Diuron, for example, was found at 72 sites (34 CSO and 41 SWO), highlighting the importance of CSO and SWO contributions of this pesticide. Some micropollutants are Fig. 2. (A). Variability of EMC site,event per site (CV EMC in orange) and SMC site (CV SMC in green) among sites per micropollutant. The mean of all site specific SMC is SMC tot . (B). The number of events n needed for a 'reliable' estimate of the site mean concentration (SMC site ) was assessed by the 95%-Confidence Interval (scattered red solid line) of the 1,000 SMCj,n /SMC site for each number n of drawn events and error band width of 0.5 and 1 of the 95%-Confidence Interval. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) predominantly found in CSO, such as pharmaceuticals from domestic wastewater (Carbamazepine, Diclofenac, and Triclosan). Nevertheless, micropollutants typically associated with domestic wastewater are also found in SWO (Carbamazepine or Diclofenac). This suggests connections of household wastewater to storm sewers, leaky sewer systems, or illegal dumping. Also, some pesticides such as Mecoprop, Metolachlor, and Terbutryn are detected more often in CSO than in SWO. The detection frequency is linked to country and catchment-specific use patterns: for example, Terbutryn was searched but never found in SWOs in Australia, or Fipronil was found only in SWO in the United States. Hence, countryspecific patterns influence the selection of micropollutants for monitoring.
Among the 80 most found micropollutants, 26 do not have EQS (marked black in Fig. 3), and no evaluation of the risk quotient was possible (Section 3.2). On the other hand, of the 40 most found micropollutants almost all have EQS available. Overall, 180 micropollutants found at least once > LOQ have no EQS values. Micropollutants often found can be important to consider for future monitoring and regulators, especially when they show a high occurrence (list of found/searched in SI B). The results of this graph depend on what has been looked for in the first place. For instance, Cholesterol has only been looked for in SWO, but most likely it would be present in all CSO sites.

Top list of micropollutants based on occurrences and risk quotient
For many sites, micropollutants were detected, but in an insufficient number of events to allow the estimation SMC site , as among the over 34,000 EMC observations 51.4% were left-censored. Moreover, in some Table 3 Overview of analyzed data for CSO and SWO in comparison. Number of pollutants includes micropollutants found at least once >LOQ. Micropollutants never found were not considered in the analysis.  studies, EQS values were below their corresponding LOQ for 40 micropollutants, hitting a maximum of 30 sites with LOQ < EQS for Estradiol (SI Fig. A 4).
Comparing occurrence with the risk quotient (RQ) allows to prioritize micropollutants to be considered in future monitoring campaigns and regulations (Fig. 4): 31 CSO and 31 SWO micropollutants showed high occurrence (found in over > 50% of the sites where they were searched for) and a RQ bigger than one (upper right grass-green zone in, Fig. 4). Among these top-listed micropollutants, the top-ten micropollutants in CSO are (from the highest RQ): cypermethrin, ibuprofen, benzo[a]pyrene, estradiol, copper, zinc, pyrene, fluoranthene, estrone, PFOS. In SWO the top-ten micropollutants are benzo[a]pyrene, fluoranthene, pyrene, copper, zinc, mercury, benzo[b]fluoranthene, chrysene, benzo[ghi]perylene, dibenz[ah]antracene. Thus, the highest RQ are found for metals and PAHs, with CSO sites additionally adding selected pesticides, PPCPs, and Household/Industrial.
Our ranking, based on measurements at the discharge point, overlaps with studies investigating measured concentrations and environmental risk in receiving waters. The analysis of Johnson et al. (2017) identified copper, aluminium, zinc, ethinylestradiol, linearalkylbenzene sulfonate, triclosan, manganese, iron, methomyl, and chlorpyrifos as the micropollutants posing the highest risk for aquatic organisms in UK rivers. Although these data are limited to one country (the UK, not included in our analysis) and not necessarily collected during wet-weather conditions, this confirms that some critical micropollutants (e.g. copper, zinc, and chloropyrifos) have a relevant source in urban wet-weather flows.
Our data-based ranking also overlaps with a previous assessment based on micropollutants sources and inherent properties. The recommendation for SWO monitoring by Eriksson et al. (2007) included: metals (cadmium, chromium, copper, nickel, lead, platinium, and zinc), PAHs (naphthalene, pyrene, benzo[a]pyrene), pesticides (pendimethalin, phenmedipham, pentachlorophenol, glyphosate, and terbuthylazine) and household/industrial (nonylphenol ethoxylates, DEHP, PCB-28, and methyl tert-butyl ether). This has a high agreement for metals, some PAHs, and household/industrial (Fig. 4). However, we calculated RQ<0.1 for glyphosate and naphthalene. Phenmedipham, PCB-28, and methyl tert-butyl ether were not searched for in the analyzed studies, while no EQS values were available for pendimethalin (found at 46% of SWO sites).
The highlighted micropollutants with RQ>1 cover different chemical classes (pesticides, household, industrial, PAHs, heavy metals, and PPCPs, Fig. 5) showing how various urban sources, ranging from household and industrial wastewater to street runoff, potentially contribute to environmental impacts. Based on life-cycle analysis, a previous study indicated that 97% of the ecotoxicity impacts from SWO discharges can be attributed to heavy metals (Brudler et al., 2019). Our analysis, based on the comparison against EQS, confirms that heavy metals are relevant, but also that additional micropollutants are relevant when looking at all potential environmental impacts of wet-weather flows.
Overall, when monitoring CSOs and SWOs to assess environmental risks, a comprehensive micropollutants list of 31 for CSO and SWO can be prioritized (RQ>1 and occurrence > 50%, Fig. 4), under consideration of country and site-specific contaminant bans or detection of illicit discharges. The comparison of micropollutants with RQ>1 for CSO and SWO (Fig. 5) shows that 22 micropollutants are relevant for both systems. PPCPs are mainly relevant in CSO: diclofenac, ibuprofen, estradiol, estrone. Several pesticides have a high RQ>1 in SWO only: carbendazim, imidacloprid, chlorpyrifos, pentachlorophenol. However, these pesticides also occur in CSO, and in selected sites concentrations can be higher than EQS. While most of the highlighted micropollutants were found at a large number of SWO and CSO sites (Fig. 3), others such as the pesticides diflufenican and cypermethrin were measured and found only at two CSO sites. Hence, increased data collection could confirm the relevance of these micropollutants.
Nevertheless, there might be overlooked micropollutants that might be highly relevant but do not have an EQS yet. This can be the case for micropollutants with high occurrence shown in Fig. 3 (e.g., some PFAS, organophosphates or nicotine) or micropollutants that have not yet been reported in stormwater. In addition, the risk quotient does not take into account transformation products, which in some cases could be more toxic than initial compounds. Therefore, if the focus of a monitoring campaign lies in identifying urban sources, the above top list can be extended with micropollutants showing high occurrence.

Variability among events and sites
Variability among events. The measured EMCs are highly variable among events, with CV EMC,site 80%-interquantile ranging from 0.3 to 1.3 for SWO and 0.3 to 1.4 for CSO (median SWO: 0.6, median CSO: 0.7). This variability among events is associated with the inherent variability due to rainfall, substance use patterns, sources, and urban land use (Gasperi et al., 2014;Mutzner et al., 2020;Rippy et al., 2017;Wicke et al., 2021). Although our analysis used datasets with experimentally reliable procedures (sampling and chemical analysis), the influence of data quality on the observed variability cannot be fully excluded. Our estimates of CV EMC for micropollutants are lower than those reported for traditional pollutants (CV EMC total suspended solids: 1.8, conductivity: 1.6, COD: 1.1, total nitrogen: 1.3 in Lee et al., 2007) in SWO. Also, McCarthy et al. (2018) found a higher variability with CV EMC values of 0.7 to 1.3 for total nitrogen and 1.4 to 2.4 for total suspended solids. The higher CV EMC for total suspended solids than total nitrogen and micropollutants in our study is an indication that the variability is pollutant specific. However, more events per site will need to be collected as we cannot exclude that the lower number of monitored events leads to the lower CV EMC in our study. The variability of CV EMC might be different if more events had been monitored. However, no clear trend between the number of measured events and CV EMC was found (SI Fig. A 6), and it is thus difficult to estimate what the "true" CV EMC would be.
A one-way ANOVA of CV EMC for SWO versus CSO sites indicates significant differences (p = 0.003, SI A Section 2.1). Also, the median CV EMC is slightly higher for CSO, which could be explained by the mixing of storm and wastewater and the temporal storage of flows, adding additional variability. Hence, the minimum number of events to sample for reliable results is slightly different for SWO and CSO (Section 3.3.2). The comparison of CV EMC with chemical classes shows slight differences for both SWO and CSO ( Fig. 6A and p<0.001 in one-way ANOVA, SI A Section 2.2). In the case of SWO, a Tukey comparison shows that this difference is mainly due to pesticides versus heavy metals, and household/industrial chemicals. Heavy metals (0.54) and household/industrial (0.59) related micropollutants in SWO have lower CV EMC than pesticides (0.79). The higher CV EMC for pesticides may be related to the local variability in usage, sources, rainfall, and seasonality for pesticides. In addition, the Tukey comparison shows differences between PAH and heavy metals for SWO, which is not the case for CSO. For CSO the differences among CV EMC are due to pesticides versus heavy metals, household/industrial and PAHs. Pesticides in CSO have a higher CV EMC of. 0.80 than other chemical classes with CV EMC ranging from 0.62 (household/industrial), 0.66 (PPCP, PAH) to heavy metals (0.67).
Variability among sites. Sufficient information was available (more than two sites with SMC site ) to calculate SMC tot * for 48 micropollutants for CSO and 117 for SWO. (Fig. 6B). The median CV SMC was higher for SWO (0.9) than for CSO (0.6) (SI Fig. A 13). An ANOVA analysis shows a significant difference between CV SMC for CSO versus SWO (p = 0.01, SI A Section 3.1). The larger variability for SWO might be explained by the higher variability of storm-driven micropollutants due to differences in catchment size, land use, and substance use. While not evaluated as part of this study, future research could include these and other watershed characteristics to determine their influence, if any, on resulting pollutant concentrations. The variability among sites is also influenced by systematic differences among sampling strategies and chemical analysis in the studies, which could induce additional variability among sites. However, the variability among sites could in part also be caused by the few events collected for some sites. Among continents, minor differences among CV SMC for SWO were found (Median CV SMC Europe: 0.69, Australia: 0.64, United States: 0.74, SI Fig. A 18) with no significant differences revealed by an ANOVA analysis. The continent-specific CV SMC is lower than the overall CV SMC for SWO of 0.9, showing that local differences influence CV SMC . Hence, a lower variability can be expected for a catchment or country-specific study. The comparison of CV SMC highlights distinct differences among chemical classes, showing larger variability among sites for certain classes (Fig. 6B). The one-way ANOVA analysis for SWO versus chemical classes (p=0.0001, SI A Section 3.2) shows that PAHs have a large influence on the observed differences, with a high median CV SMC (PAHs: 1.6 and HM: 1.2). A previous study in the US showed that although the range of concentrations can vary, commercial parking lots and residential driveways exceed all other source areas, often by one or two orders of magnitude (Selbig, 2009). In the case of CSO, pesticides have a higher median CV SMC (1.05) and the differences in the one-way ANOVA analysis (p=0.006) are attributed to pesticides versus PAHs. The observed differences in CV SMC could also be related to the smaller sample size for pesticides and more data would be needed to clarify the observed patterns.

How many events should be sampled?
The resampling results show the dependency between the number of events to be sampled and CV EMC (Fig. 7A), with an increasing number of events needed for higher variability in EMC. As the CV EMC depends on the chemical class (Fig. 6), the number of events to be sampled is different depending on the analyzed micropollutant (SI A Section 2.3). Based on our resampling procedure, a median number of events of 7 results for CSO (80%-interquantile of 2 to 31 events for all analyzed CV EMC , Fig. 7A1) and 6 for SWO (80%-interquantile of 1 to 25 events for all analyzed CV EMC , Fig. 7A1). If the error band width is set to 0.5, the median number of events increases to 28 and 25 for CSO and SWO, respectively (Fig. 7A2). However, there is a large variability among the number of required events depending on the analyzed site and thus, Fig. 7 can be used to determine the number of required events based on local conditions and substance selection.
In comparison, McCarthy et al. (2018) estimated a higher average number of required events for traditional pollutants in SWO of 27 (total suspended solids), 12 (E. coli), and total nitrogen (11) (error band width of 1). A theoretical calculation assuming normally distributed data by Burton and Pitt (2001) estimated 20 events for a CV EMC of 0.4 (error band width of 0.5), which is higher than our estimate of a median of 10 events for a CV EMC of 0.4. A potential explanation for the lower number of events in our study could be the difference in methodology, where we assume that the data are log-normal distributed. Moreover, even though the studies considered in our analysis are among the most comprehensive found in literature, the number of measured events was lower than the estimated required events for 40% of the SMC site observations (SI Fig. A 10). Thus, our estimation of the number of events to be monitored may be biased. Moreover, our analysis used only data from sites with sufficient data points for statistical inference. In case of a large number of censored data points, more events would be needed to determine a reliable SMC Site .

How many sites should be sampled?
The number of sites required to obtain a reliable estimate of the range of SMC site considering the full dataset used in this study 4 sites for CSO (80%-interquantile of 2 to 16) and 10 sites for SWO (80%-interquantile of 3 to 22) for an error band width of 1 around SMC tot (Fig. 7B1). The number of required sites increases considerably for an error band width of 0.5 around SMC tot , with 18 CSO and 41 SWO sites (SI A Section 3.3). The differences between CSO and SWO sites relate to the higher variability of CV SMC among SWO sites (Section 3.3.1). However, considering only one continent, the median number of required sites is lower due to a lower CV SMC per continent (0.6 to 0.74) than the overall CV SMC of 0.9. Moreover, the required number of sites depends on the catchment of interest, where the number of sites is limited and which is subject to different environmental regulations, climatic conditions, use patterns, and design criteria.
These results have to be interpreted carefully, as 43% of all micropollutants have fewer sites measured than would be necessary according to our analysis for a reliable estimate of SMC tot (SI Fig. A 17). This is especially the case for SWO where 66 out of 117 micropollutants (56%) have not enough sites monitored. In the case of CSO, this is the case for only 6 micropollutants out of 48 (13%). In general, more sites per micropollutant have been monitored for CSO than SWO (median 12 CSO sites and 8 CSO sites). In conclusion, the obtained results for CSO seem to be more reliable than for SWO and the number of required sites is lower for CSO than SWO. Here, the importance of monitoring a sufficient number of sites is shown, which will enable a model estimate of pollution levels for an urban catchment or river stretch. However, there are some substance-specific considerations. Micropollutants with high CV SMC require a much larger number of sites, with sometimes more sites needed for CSO than for SWO.
For instance, the estimation of SMC tot for diuron requires the monitoring of 25 CSO and 14 SWO sites. This also shows how, for selected contaminants, the number of sites to monitor is significantly higher than our average values reported above. These high site numbers may also indicate micropollutants with high variations linked to regional or land use patterns. For example, Wicke et al. (2021) highlighted high differences in building-based biocides diuron, and mecoprop between French and German sites, probably due to regional product preferences. In these cases, the estimation of a minimum number of sites is very challenging based on the currently available data.

Discussing the future of monitoring micropollutants in wet-weather flows
Based on the survey of data from the monitoring campaigns that have been published in the last decade, and their outcomes, we identified several learnings for future monitoring campaigns and regulations of wet-weather flows: Which substances should we measure and regulate?
1. Our data-driven analysis highlights micropollutants with high occurrence and risk quotient to focus on in regulatory and research monitoring campaigns, thus ensuring that efforts and resources are maximized. Clearly, this general list will need to be adapted to the local conditions, as the prioritization could be affected by sitespecific factors, such as industrial production and land usage, use habits, and country-specific regulations. The analyzed data are based on a selected, developed countries and the transferability of the results to other countries depends on local product preferences and use patterns. Also, the top-listed micropollutants are subject to change over time as new products are brought to the market, while others are substituted or banned. This is especially relevant for micropollutants used in household products and found in domestic wastewater where product composition and usage pattern can change over time. For mainly stormwater-based micropollutants the evolution over time is buffered, as building materials or vehicles create a stock of micropollutants. 2. Selected non-target screening for future stormwater quality monitoring is needed to update the top list with new micropollutants of emerging concern such as the tire-rubber chemical 6-PPD-quinone (Tian et al., 2021). Moreover, access to consumer data and communication with product regulators could lead to targeted monitoring of micropollutants of potential concern. 3. Micropollutants are often grouped into chemical classes. However, the top lists from our study show the occurrence of micropollutants belonging to different groups. This suggests that monitoring efforts should target different classes, as one class does not cover the whole toxicity spectrum and/or the diversity of urban sources. Also, to reduce the resource requirements, it is beneficial to identify "indicator micropollutants" (i.e. proxies of other substances) that are representative of specific sources, transport mechanisms, environmental behavior and/or land usages to reduce analytical efforts in case of limited resources (Launay et al., 2016;Mutzner et al., 2020). Although this was not performed in this study, the collected data can provide the information basis for such future analysis. How much should we measure?
1. Micropollutants at concentrations of potential risk for aquatic organisms are found in both CSO and SWO. Thus, the assessment of surface water status requires the consideration of both wet weather fluxes. 2. The analysis indicates the number of events for wet-weather flow monitoring of CSO (2 to 31 for events 80%-interquantile range, median 7) and SWO (1 to 25 events for 80%-interquantile range, median 6) that need to be monitored to draw reliable conclusions (error band width of 1) for SMC site . For selected chemical classes (pesticides for CSO and SWO, PAHs for SWO) or expected high variability (CV EMC ) the number of events needs to be chosen higher. Moreover, also for individual micropollutants, the number of events can differ considerably from the estimated median number of events (SI Fig. A 11), and Fig. 7 can be used to estimate the number of events.
3. The observed high number of censored values in the collected data (<50%) stresses the importance to redesign future wet-weather monitoring programs by improving current sample preparation and chemical analysis. This is highlighted by a considerable number of micropollutants (40 of 117), where EQS was smaller than LOQ in some studies. 4. Reliable sampling and chemical analysis are essential for robust statistical inference. For example, the variability among events estimated in our work (expressed as CV EMC ) was lower than previously reported for traditional pollutants (Lee et al., 2007;McCarthy et al., 2018). While there is some indication that the variability is pollutant specific, we do not know if the lower variability in our study is also related to the small number of events monitored at some sites. 5. Most urban catchments and surface waters have a large number of wet-weather discharge sites, thus we need to know how many sites would need to be monitored to allow a model prediction for all discharge sites. Based on our full dataset covering different countries, our analysis indicates that a minimum of 4 CSO (80%-interquantile of 2 to 16) and 10 SWO (80%-interquantile of 3 to 22) sites need to be monitored to estimate a typical range of SMC tot (an error band width of 1). However, the required number of sites depends on the catchment of interest for which typically the number of sites is limited and local conditions regarding environmental regulations, climatic conditions, use patterns, and design criteria are more constant. Hence, the required minimum number of sites gives the first indication if there is a lack of local monitoring data. Generally, a monitoring program has to be planned carefully to cover a diversity of typical land-use types, especially for chemical classes with high variability among sites (PAH, heavy metals, pesticides). 6. The concept of SMC implies the hypothesis of stationarity of the pollutant release processes throughout the monitoring period. However, pollutant sources in the catchment can change due to e.g. changes in the human activities/industries in the catchment, banning of specific substances, etc. This stresses the importance of adapting the monitoring program to the knowledge of the pollutant sources in the catchment.
What else do we need to take into account?
1. In our risk evaluation we compared Event Mean Concentrations (EMC) with EQS for surface waters. However, most EQS values are not designed for short-term (minutes to hours) wet-weather flows and not all micropollutants with high occurrence have EQS values. Therefore, new risk assessment tools, specifically targeting wetweather flows, are needed to evaluate the results of monitoring efforts. In addition, EMC values lump intra-event variability, neglecting possible ecotoxicology impacts due to peak concentrations. Also, monitoring data in high-temporal resolution would be needed to evaluate peak concentrations. Our analysis is based on EQS values, for future work also other approaches could be considered that complement or substitute the EQS approach. Moreover, the risk quotient shows the required dilution factor in the receiving water and not the actual eco-toxicological risk. 2. There is a clear need for the organization of available and future data according to the FAIR data principles (Findable, Accessible, Interoperable, Reusable, as outlined in Wilkinson et al., 2016 and www. go-fair.org/fair-principles/). Our analysis thoroughly scanned the literature looking for suitable datasets. However, for some published datasets, raw data were not findable, and/nor accessible (e.g. corresponding authors did not reply to our inquires). Therefore, there are potentially more datasets of interest that could contribute to refining our conclusions. Not all the datasets included in this study were immediately accessible, and they required extensive pre-processing before our analysis could be performed. Moreover, we did not report and analyze catchment characteristics due to large differences and subjectivity in the reported land usage, which is a challenge for a systematic comparison among sites. 3. Careful assessment of available resources is essential before undertaking new monitoring campaigns, to ensure that collected data are sufficient to draw reliable conclusions. Given these considerable challenges related to monitoring micropollutants in wet-weather flows, the involvement of experienced observatories and research teams in monitoring programs is beneficial.

Conclusions
Our analysis of data from past monitoring efforts, covering 77 wetweather discharge sites, showed that: • The collected data has a large number of observations below LOQ (> 50%) and 40 micropollutants with cases of EQS below LOQ. Future monitoring outcomes can be greatly improved with the definition of minimum standards for monitoring wet-weather discharges (sampling technique, number of events), and suitable chemical analysis of micropollutants by considering expected concentrations. Moreover, the selection of top-listed micropollutants found in higher concentrations (e.g. µg/L range) will enable a reliable assessment if chemical analysis resources are not sufficient for lower concentration ranges. • Micropollutants from all considered chemical classes (heavy metals, household/industrial, PAH, pesticides, pharmaceuticals, and personal care products) and type of wet-weather discharge (CSO or SWO) were measured at concentrations above EQS for surface waters. Hence, a thorough monitoring campaign covering different chemical classes and both wet-weather discharge types (if relevant in the specific catchment) would allow to assess environmental impact for receiving waters. If resources are limited, the total number of monitored micropollutants could be reduced to those identified in the top list (after consideration of local, regional, and national use patterns). • The number of events (and sites) needed to draw reliable conclusions depends on the micropollutants concentration variability, with more events needed if a micropollutant is expected to show highly variable concentrations. For future monitoring campaigns we, therefore, propose to carefully consider if available resources are sufficient to draw reliable conclusions. In case of limited resources, an option would be to select indicator micropollutants with high occurrence. Moreover, future data analysis is needed to determine if more sites or more observations per site given a fixed budget produce reliable results for an urban catchment. Measuring more sites would lead to higher costs due the monitoring equipment needed for each site. • Our analysis highlights the high variability of micropollutant levels in wet-weather flows from urban areas. Larger datasets would enable future analyses, such as exploring the factors (e.g. climatic and catchment specific characteristics) behind this variability, potentially predicting the expected pollution levels and the corresponding environmental impacts in the receiving water bodies. Collecting sufficient data requires considerable resources, which are beyond what single monitoring activities can typically afford. Hence, there is a global need for a minimum standard reporting, and for promoting a broader sharing of past and future measurements according to FAIR principles. The data and code used in this work can be accessed on https://doi.org/10.5281/zenodo.6808401.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.