Stochastic Method for Evaluating Removal, Fate and Associated Uncertainties of Micropollutants in a Stormwater Biofilter at an Annual Scale

A stochastic method for evaluating the in situ mass balance of micropollutants in a stormwater biofilter, accounting for inlet and outlet loads and the evolution of pollutant mass in the filter media (ΔMsoil) at an annual scale, is proposed. In the field context, this type of calculation presents a number of methodological challenges, associated with estimating water quality for unsampled rain events, reconstituting missing or invalidated flow data and accounting for significant uncertainties associated with these estimations and experimental measurements. The method is applied to a biofiltration swale treating road runoff for two trace metals, Cu and Zn and six organic micropollutants: pyrene (Pyr), phenanthrene (Phen), bisphenol-A (BPA), octylphenol (OP), nonylphenol (NP) and bis(2-ethylhexyl) phthalate (DEHP). Pollutant loads were reduced by 27–72%. While organic micropollutants are likely to be lost to degradation or volatilization processes in such systems, dissipation could not be demonstrated for any of the organic micropollutants studied due to emissions from construction materials (case of BPA, OP, NP and DEHP) or high uncertainties in ΔMsoil (case of Pyr and Phen). The necessary conditions for establishing an in situ mass balance demonstrating dissipation, which include acquisition of data associated with all terms over a period long enough that uncertainty propagation is limited and the absence of additional sources of pollutants in the field, are discussed.


Introduction
In order to minimize the effects of urbanization on the water cycle and aquatic environments, new paradigms for urban stormwater management have emerged and gained in popularity. Known under the terms low impact development (LID), sustainable drainage systems (SuDS) or water sensitive urban design (WSUD), these approaches advocate managing urban runoff close to the source, often through soil-based, vegetated devices, designed to improve water quality and meet hydrologic objectives while Table 1. Filter media and geometric properties of the biofiltration swale.

Properties Biofiltration Swale
Filter media texture Sandy loam Filter media pH in water 8.73 Initial organic carbon content (by mass, %) 0.9 Hydraulic conductivity (mm/h) 21 Filter media depth (cm) 50 Ponding depth (cm) 17 Width (perpendicular to road, m) 0.5 Length (parallel to road, m) 32 Surface area ratio (A device /A catchment , %) 4.5 Untreated road runoff (RR), considered to be representative of the quality of water entering the BFS, was collected from a reference catchment at a nearby section of the same road ( Figure 1). parameters [19,20]; in this case, medium-or long-term load estimations require a reconstitution of this data. These reconstitutions, as well the measurements, sampling and analyses of any stormwater monitoring campaign, entail uncertainties, which must be accounted for in order to draw pertinent conclusions [21]. The objective of the present work is to present and test a stochastic method for evaluating such a mass balance, accounting for its uncertainties. The method is applied to a selection of micropollutants with varying properties, which are typical of road runoff.

Site Description
The study site includes a biofiltration swale (BFS), which drains water from the RD 212, a highway with a daily traffic of 11,000 vehicles/day, located in Compans, France in the Paris region. The BFS is separated from the road surface by a galvanized steel safety barrier and a thin strip of sloped asphalt, added at the time of construction to guide runoff into the swale. The characteristics of this system, which began operation on 18 March 2016, are described in Table 1. It is lined and water was collected from a drain beneath the filter media at a depth of 50 cm. Its filter media was prepared by mixing 40% (by volume) silt loam topsoil with 60% 0-4 mm limestone sand using a rototiller. The mixed filter media includes a significant proportion of grains greater than 2 mm (27 ± 4.5% relative standard deviation); the fraction below 2 mm has a sandy loam texture. The media was found to be less dense at the surface (1.7 kg/L in the first 5 cm) than at greater depths (1.9 kg/L), with a deviation of as much as ±20% among measurements at the same depth. The system was designed to treat small, frequent rain events. As the terrain had a 1.3% slope parallel to the road, the BFS was divided into three sections with concrete check dams (installed in June 2016, after 3 months of operation) to increase surface storage volume, yielding an average ponding depth of 17 cm.
Untreated road runoff (RR), considered to be representative of the quality of water entering the BFS, was collected from a reference catchment at a nearby section of the same road ( Figure 1).

Continuous Measurements
Rainfall was measured on site with a tipping bucket rain gauge. RR and BFS drain flows were continuously measured (from November 2015-April 2017 for RR and from April 2016-April 2017 for BFS) using tipping bucket flow meters (17 L bucket volume at RR and 1 L volume at BFS). The BFS overflow was measured using a V-notch weir beginning in October 2016. Rainfall measurements were validated by comparison with external rain gauges from the Paris region and comparison with flow measurements. Flow measurements were compared with rainfall for validation and tipping bucket volumes were verified in situ.
RR was also equipped with a multiparameter probe, which continuously measured both electrical conductivity and turbidity in the untreated runoff. Flow-weighted mean turbidity measurements for sampled events were cross-validated with laboratory-measured turbidity and total suspended solids (TSS) measurements for sampled events. In addition, potentially faulty turbidity measurements were identified using several criteria: (minimal turbidity < 0 Formazin Nephelometric Units (FNU), maximal turbidity > 800 FNU, changes in turbidity greater than ± 200 FNU/min, variation of turbidity over an event of less than 200 FNU over a full event or event mean turbidity less than 70 FNU) [22]. For events containing at least one of these criteria, turbidity was visualized along with the hydrograph before a decision as to its validation was made.

Water Sampling and Pre-Treatment
Water sampling is detailed elsewhere [23]. Briefly, water was collected in proportion to flow volume in order to estimate event mean concentrations. Sampling began in February 2016 for RR and in May 2016 for BFS; a total of 14 and 15 events were sampled, respectively. 11 of these events were sampled simultaneously at both locations, while the remaining events were sampled at only one point due to different starting dates and technical difficulties. Sampling covered all four seasons and a variety of hydrologic conditions. Figure A1 in Appendix A presents a timeline of sampled events along with daily rainfall.
Water samples were collected within 24 h of each rain event and sent to partner laboratories within 24 h of collection where they are immediately filtered. Water was analysed for a large number of water quality parameters and micropollutants [23], which were analysed in the particulate and dissolved phases. The present work focuses on a subset of these parameters, chosen to represent the pollutants from different families and with varying properties: total suspended solids (TSS), organic carbon (OC), two trace metals (Cu and Zn), two PAHs (pyrene-Pyr and phenanthrene-Phen), bisphenol A (BPA), two alkylphenols (octylphenol-OP and nonylphenol-NP) and one phthalate (bis(2-ethylhexyl) phthalate-DEHP).

Soil Sampling and Preparation
An initial sample of filter media (soil) was taken from the previously mixed filter media before its installation in the BFS in March 2016.
The filter media in the BFS was sampled in April 2017 according to the method developed by Tedoldi et al. [24], which relies on a sampling procedure in two phases. First, the surface of the BFS was sampled using an identical high-resolution grid in each section of the BFS (Figure 2a); each sample was dried, sieved to 2 mm, homogenized and analysed for Cu, Zn and Pb using portable X-ray fluorescence spectrometry. Resulting soil concentrations were interpolated to establish a cartography of trace metal surface contamination, which was then used to choose 2-3 zones of similar contamination in each section. In each zone, 2-4 cores were collected ( Figure 2b) using a stainless steel gouge auger (3 cm inner diameter), which was cleaned and rinsed twice with ultrapure water between samples. Each core was divided into three segments: 0-5 cm, 5-15 cm and 15-50 cm. Samples from each depth and each zone were combined to form a composite sample. Each composite sample was quartered and separated into subsamples, which were stored and prepared according to the intended analysis. For trace metal analysis, samples were stored in plastic bags, sieved to 2 mm and acid digested according to the standard NF X31-147 (total digestion). Samples analysed for organic micropollutants were stored in grilled, amber glass jars, immediately frozen upon arrival at the laboratory, then freeze-dried, ground and sieved to 2 mm before extraction and analysis.
The initial sample and the soil cores were analysed for organic carbon and the same micropollutants as were analysed in water [23]; among these, the present work focuses on OC, Cu, Zn, Pyr, Phen, BPA, OP and NP.

Analytical Methods
The analytical methods, limits of quantification (LoQ) and analytical uncertainties for the pollutants detailed in the present article are summarized in Table 2. Each composite sample was quartered and separated into subsamples, which were stored and prepared according to the intended analysis. For trace metal analysis, samples were stored in plastic bags, sieved to 2 mm and acid digested according to the standard NF X31-147 (total digestion). Samples analysed for organic micropollutants were stored in grilled, amber glass jars, immediately frozen upon arrival at the laboratory, then freeze-dried, ground and sieved to 2 mm before extraction and analysis.
The initial sample and the soil cores were analysed for organic carbon and the same micropollutants as were analysed in water [23]; among these, the present work focuses on OC, Cu, Zn, Pyr, Phen, BPA, OP and NP.

Analytical Methods
The analytical methods, limits of quantification (LoQ) and analytical uncertainties for the pollutants detailed in the present article are summarized in Table 2. Table 2. Summary of employed analytical methods. M is the mass of TSS extracted in mg, which varied according to the concentration of TSS in the water sample and the filtered volume. Abbreviations: ICP-OES (inductively coupled plasma optical emission spectroscopy), ICP-MS (inductively coupled plasma mass spectrometry), GC-MS (gas chromatography coupled with mass spectrometry), UPLC-MSMS (ultra performance liquid chromatography with tandem mass spectrometry).

Mass Balance and Uncertainty Calculations
A mass balance was evaluated for the selected pollutants in the BFS according to Equation (1) over a period of the first 13 months of operation (18 March 2016 through 18 April 2017). Available experimental data were combined with modelling or interpolation approaches in order to calculate each term, the details of which are described in the following paragraphs. In order to account for uncertainty, a Monte Carlo approach was used, wherein uncertain values used in the calculation of these terms were varied stochastically (see Appendix B for lists of all stochastically varied parameters and distributions used). All calculations were repeated 10,000 times.
where M soil,initial is the mass of pollutant present in the soil at the beginning of the period, M soil, f inal is the mass of pollutant in the soil at the end of the period, M in,RR is the pollutant load from road runoff, M out,drain is the pollutant load in the outflow from the drain, M out,over is the pollutant load in the system overflow and ∆M is the difference in the characterized masses-a positive ∆M indicates that some dissipation of a pollutant has occurred and a negative ∆M corresponds to a pollutant for which an additional source is present. It is to be noted that this formulation of the mass balance assumes that runoff water is the only pertinent source of pollution, neglecting other potential sources of pollution, including direct atmospheric deposition. The validity of this assumption will be evaluated in the discussion section.
Terms of the mass balance were used to calculate the efficiency of the system at intercepting pollutant loads (E int , Equation (2)) and to compare accumulation in the soil (∆M soil = M soil,final − M soil,initial ) with the intercepted mass (M intercepted = M in,RR − M out,drain − M out,over ).
where E int is the percent mass of pollutant retained by the system, M out,drain is pollutant load in the outflow from the drain, M out,over is the pollutant load in the system overflow and M in,RR is the pollutant load from road runoff.

Loads Associated with Water Flows
The overall loads associated with each water flow term (M in,RR , M out,drain and M out,over ) were evaluated by summing the loads associated with each rain event (Equation (3)) occurring during the period.
where M is the load over the full period, V ev is the flow volume for a given event, C ev is the event mean concentration for the event and n ev is the total number of events during the period. The generalized methods used for determining the volume and concentration associated with each event for the different flow terms are summarized by Figures 3 and 4 respectively; specific methods used for each flow term are detailed in the sections that follow.

Separation of Rain Events
Events were defined as beginning with the first recorded rainfall at an on-site rain gauge with a resolution of 0.2 mm. A minimal period of 60 min (the average time required for flow to begin in the BFS drain following the beginning of rainfall) was defined as separating two events. The event was considered to be finished when no rainfall was recorded over this hour and the outflow from the BFS drain was equal to at most 0.5% of the recorded volume since the beginning of the event. It was not possible to wait for flow to stop entirely in the BFS drain to define the end of events, as flow continued at low rates for many hours following the end of rainfall. The fraction of 0.5% was chosen, following a sensitivity analysis, as a compromise which attributed most BFS drain flow to events (91%) without

Separation of Rain Events
Events were defined as beginning with the first recorded rainfall at an on-site rain gauge with a resolution of 0.2 mm. A minimal period of 60 min (the average time required for flow to begin in the BFS drain following the beginning of rainfall) was defined as separating two events. The event was considered to be finished when no rainfall was recorded over this hour and the outflow from the BFS drain was equal to at most 0.5% of the recorded volume since the beginning of the event. It was not possible to wait for flow to stop entirely in the BFS drain to define the end of events, as flow continued at low rates for many hours following the end of rainfall. The fraction of 0.5% was chosen, following a sensitivity analysis, as a compromise which attributed most BFS drain flow to events (91%) without

Separation of Rain Events
Events were defined as beginning with the first recorded rainfall at an on-site rain gauge with a resolution of 0.2 mm. A minimal period of 60 min (the average time required for flow to begin in the BFS drain following the beginning of rainfall) was defined as separating two events. The event was considered to be finished when no rainfall was recorded over this hour and the outflow from the BFS drain was equal to at most 0.5% of the recorded volume since the beginning of the event. It was not possible to wait for flow to stop entirely in the BFS drain to define the end of events, as flow continued at low rates for many hours following the end of rainfall. The fraction of 0.5% was chosen, following a sensitivity analysis, as a compromise which attributed most BFS drain flow to events (91%) without creating excessively long events. In order to ensure that the period's full BFS drain volume was accounted for, the series was multiplied by the inverse of the fraction counted within events (×1.1).

M in,RR
Comparison of rainfall with RR runoff volumes showed a highly variable runoff coefficient, often exceeding one, indicating that the road surface catchment was not clearly divided and the contributing surface varied between rain events (most likely as a function of rainfall condition). These data were thus judged not to be representative of flows entering the BFS, nor sufficiently reliable to calibrate a model. They were only used to evaluate event mean turbidity concentrations.
Therefore, for each event, the inflow volume (V in ) was evaluated using a simple initial loss reservoir model [25], with a depth SI, that empties by evaporation and generates runoff by overflow. It was applied to the impermeable road surface catchment using on-site rain data and local evapotranspiration measurements from Météo France. As the initial losses could not be calibrated, its depth SI was varied stochastically between 0.4 and 1 mm, values typical of a new road surface such as that in Compans [26,27].
For sampled events, which covered 124 mm of the 610 mm of rainfall in the 13 month period considered, measured total concentrations were applied, varying analytical errors stochastically, assuming normal distributions (Appendix B, Table A1).
According to the non-parametric Spearman correlation test at a 95% confidence level, no correlations were observed between total event mean concentrations of any of the selected pollutants and rain event characteristics (rainfall depth, maximum rainfall intensity, mean rainfall intensity, antecedent dry period). Therefore, concentrations for events outside of these periods were estimated using a stochastic method developed by Hannouche et al. [28], which makes use of continuous turbidity measurements in order to minimize uncertainty associated with the stochastic approach. A linear regression was established between TSS measurements from sampled events and corresponding volume-weighted event mean turbidities calculated from continuous measurements (Appendix C). When valid turbidity measurements were available, a TSS concentration was calculated from the volume-weighted event mean turbidity, including a randomly selected error assuming a normal distribution of residuals from the regression (Table A2). As it is likely that this relationship is no longer linear for very small turbidity values, when turbidity was below the lowest event mean turbidity (50 FNU) measured for sampled events, TSS values were selected in a uniform distribution between zero and 70 mg/L (the corresponding lowest event mean TSS concentration). For pollutants for which particulate concentrations in water are correlated with TSS (P < 0.05 according to the non-parametric Spearman test, a hypothesis which was retained for all pollutants except NP), total concentrations were estimated according to Equation (4). When valid turbidity data was unavailable or when particulate concentrations in water did not correlate with TSS concentrations, C ev was directly sampled from the pollutant's total concentration distribution.
where C ev is the estimated event mean concentration (g/L), C D,ev is a randomly sampled value in the pollutant's dissolved concentration distribution (g/L), S P,ev is a randomly sampled value in the pollutant's solid particulate concentration distribution (g/g) and C TSS,ev is the TSS concentration estimated from the turbidity measurement (g/L). As pollutant concentrations in stormwater tend to follow log-normal distributions [29], experimental measurements of C ev , C D,ev and S P,ev were first tested for log-normality by a Shapiro-Wilk test on log transforms using a limit of P < 0.05. If the hypothesis of log-normality was rejected, the data set was tested for normality. The hypothesis of log-normality was only rejected for dissolved concentrations of OP, for which the hypothesis of normality was not rejected. The distributions employed for each pollutant are described in Appendix B (Table A3).
An additional source of uncertainty related to inlet mass calculation is the representativity of water samples. This uncertainty is associated with both the efficiency of sampling (whether the event mean sample accounts for the full event) and with sampling from a reference catchment rather than sampling the real inlet water. As it is difficult to quantify the effects of this latter uncertainty, which is likely dominant, these sources of uncertainty were not accounted for. As such uncertainties associated with inlet mass calculations may be slightly underestimated.

M out,drain
As with RR, concentrations were measured in the BFS drain for some events; in this case, the sampled events represented 145 mm out of the 610 mm of rainfall over the studied period. Again, measured concentrations were used where possible, stochastically accounting for analytical uncertainty. When either the dissolved or particulate concentrations were below the LoQ, the concentration in the given phase was selected in a uniform distribution between 0 and the LoQ and the total concentration was calculated as the sum of total and dissolved concentrations.
At this location, turbidity was not measured continuously as the particulate fraction of pollutants was expected to be less significant, an assumption that was confirmed through the measurement campaigns [23]. Therefore, for most pollutants, outlet concentrations were randomly sampled within the log-normal (P < 0.05) total outlet concentration distributions (Table A4).
A different approach was used for Phen and Pyr, which were frequently found to be below the limit of quantification (LoQ) in the dissolved and/or particulate phases. In this case, particulate concentrations were treated separately from dissolved concentrations. For each event, a choice as to whether a concentration was below the LoQ was made randomly, assuming the probability of a dissolved or particulate concentration being beneath the LoQ to be equal to the proportion of samples with measurements below the LoQ for the parameter. If the parameter was chosen to be below the LoQ for a given event in a given phase, its concentration was selected in a uniform distribution between zero and the LoQ. Otherwise, its concentration in the given phase was selected from the distribution of measurements above the LoQ. A log-normal distribution was used in the particulate phase for both pollutants and for the dissolved phase of Pyr; as Phen was only measured in the dissolved phase in 4 samples, a uniform distribution was used (Table A4). The total concentration is taken to be the sum of the dissolved and particulate phase concentrations.
For most of the period, drain outlet volumes were evaluated using continuous flow measurements; the only source of uncertainty was that associated with this measurement, which was assumed to be normally distributed. However, data was not available between the beginning of operation of the BFS (18 March 2016) and the installation of the flow meter (6 April 2016). Outflow volumes for this period were reconstructed by multiplying the event input volume by a randomly selected value from the log-normal (Shapiro-Wilk P < 0.05) distribution of the ratio of drained volume to input volume (runoff and direct rainfall) from the period with available measurements before the installation of the check dams.

M out,over
Concentrations in the system overflow were not measured experimentally. It was therefore assumed that overflow concentrations for each event were related to the corresponding inflow concentrations according to the Equation (5), which assumes that the BFS acts as a well-mixed reservoir over the event duration, that the rainfall itself is not a source of pollutant mass and that no sedimentation or erosion occurs. Both because sedimentation may occur in a system of this type and because most overflow will occur later in rain events after the first flush of pollutants [30], this hypothesis likely leads to an overestimation of overflow mass, the effects of which will be discussed later.
where C over,ev is the total pollutant concentration for a given event (g/L) and C in,RR,ev , V RR,ev and h rain,ev are the total concentration in road runoff (g/L), the volume of road runoff (L) and the rainfall depth associated with the same event (mm), respectively and A BFS is the biofiltration swale surface area (m 2 ).
Experimental measurements of overflow were available from November 2016-April 2017. For this period, uncertainty in the overflow volume was considered to be due to the uncertainty in the measurement of water depth at the V-notch weir. An error in this depth was selected randomly assuming its relative error to be normally distributed and then used to calculate a volume error (Table A2).
Overflow for the period from March-November 2016 had to be estimated. To do so, the event inflow volume (V RR,ev + h rain,ev A BFS ) was compared to the storage volume available at the surface of the BFS. When the event volume was smaller than the storage volume, V over was assumed to be null. When this was not the case, V over was calculated according to Equation (6). This model was validated for the period during which overflow data was available; measured overflow volumes were within the predicted range for 134 of the 139 events for which data was available.
where V over,ev is the volume of overflow (L), V RR,ev is the volume of RR (L), h rain,ev is the total rainfall depth (mm), A BFS is the biofiltration swale surface area (m 2 ), V drain,ev is the drain outflow volume (L) and ∆V media,ev is the evolution in the water stored in the filter media (L). For each event, ∆V media,ev was randomly selected in a uniform distribution between 0 and field capacity of the full BFS.

Mass Stored in Soil
The variation over the 13 month period of the mass of each pollutant pol stored in soil was calculated according to Equation (7).
where M soil,final and M soil,initial are the masses of the pollutant stored in the soil at the end and at the beginning of the study, respectively (g), the volume of the soil is divided into 24 different sub-volumes corresponding to i = 8 zones (2 to 3 zones in each of the 3 BFS sections) and j = 3 depth layers (0-5 cm, 5-15 cm, 15-50 cm), S soil,i,j is the final concentration of the pollutant pol in the <2 mm fraction attributed to zone i and depth j (g/g), S soil,init,i,j is the initial concentration of the pollutant pol in the <2 mm fraction attributed to zone i and depth j (g/g), V soil,i,j is the volume of soil in the sub-volume i,j of the BFS (L), ρ soil,i,j is the bulk density of the filter media (g/L) and f soil,<2mm,i,j is the mass fraction of the soil with particles less than 2 mm.

Definition of Soil Properties
The BFS sub-volumes i,j were defined according to the soil sampling strategy applied in April 2017. As previously mentioned, at the end of the study, the filter media was sampled by first characterizing surface contamination and defining 2-3 zones of similar surface contamination (from a most contaminated to a least contaminated zone) in each of the three BFS sections separated by check dams (Figure 2). A composite core was than sampled for each zone i and separated in 3 depth layers j.
An analytical error in the final concentration of each pollutant (S soil,pol,i,j ) was randomly selected for each iteration. The resulting concentration was then attributed to the soil mass associated with a corresponding sub-volume of soil (V soil,i,j , Equation (8)).
where V soil,i,j is the sub-volume assigned to sample i,j in the soil pollutant mass calculation (L), V section is the volume of soil in the section (L), f i is the fraction of the surface area of the section applied to zone i, h j is the soil depth of the layer j of the core (cm) and h core is the soil depth of the core [cm], equal to the depth of the biofilter's filter media.
This sampling strategy relies on the hypothesis that surface contamination from metals is correlated with contamination from each pollutant at greater depths. For each section, pollutant and depth, this hypothesis was evaluated by comparing the ranks of surface concentrations with the ranks of composite core concentrations for the same zone. Figure 5 demonstrates the distribution of a pollutant for which the zones were often representative (Cu) and a pollutant for which the zones were not representative (BPA).
pollutant for which the zones were often representative (Cu) and a pollutant for which the zones were not representative (BPA).
The method for assigning a mass of soil to each sample is described in Figure 6. When ranks between surface contamination and composite core contamination were identical, fj was assumed to be equal to the proportion of the section surface area covered by the zone. In this case, uncertainty due to the subjectivity of identifying the boundaries of each zone was accounted for by randomly selecting an error for each zone's surface area in a uniform distribution between ±10% of the total section surface area, such that the sum of zone areas in each section remained constant (Table A5).
When the ranks of concentrations at a given depth were different from those at the surface, the zones were assumed not to be representative of the distribution of contamination. Therefore, the composite cores were treated as random samples, each equally representative of the entire surface in the section. In this case, fi was selected in a uniform distribution between 0 and 1, such that the sum of fractions for each zone were equal to 1.  The method for assigning a mass of soil to each sample is described in Figure 6. When ranks between surface contamination and composite core contamination were identical, f j was assumed to be equal to the proportion of the section surface area covered by the zone. In this case, uncertainty due to the subjectivity of identifying the boundaries of each zone was accounted for by randomly selecting an error for each zone's surface area in a uniform distribution between ±10% of the total section surface area, such that the sum of zone areas in each section remained constant (Table A5).

Definition of Ssoil,init,i,j
The initial concentration of each pollutant in the soil was estimated by analysing all pollutants in a sample of filter media taken before its installation in the BFS. In the mass balance calculation, an analytical error in the initial concentration of each pollutant was randomly selected for each iteration. When initial concentrations were below the LoQ, initial concentrations were randomly selected in a uniform distribution between 0 and the LoQ. An additional source of uncertainty is the representativeness of the initial filter media sample for each portion of the BFS due to imperfect mixing of topsoil and sand to produce the initial filter media at the time of construction. As the filter media in this system is composed of topsoil with low calcium content (0.3%) and lime sand containing 38% calcium, calcium may be considered to be a tracer of the sand, which can be used to evaluate the heterogeneity of mixing of the filter media. Calcium content varied by a relative standard deviation of 13.8%, indicating an imperfect homogenization of the two materials during media fabrication (see Appendix D for the calcium distribution). Therefore, a factor of variability fvar, equal to the variability of calcium in the media cores sampled in the BFS, was applied to the initial concentration of each pollutant in order to estimate the uncertainty in the initial pollutant mass due to the variability in the filter media's composition. As the hypothesis of the normality of the calcium content distribution was not rejected (Shapiro-Wilk P < 0.05), this factor was randomly selected in a normal distribution with the relative standard deviation of calcium measured in all soil samples.
A different factor of variability was applied to each sub-volume i,j of the BFS (Figure 7). When the ranks of concentrations at a given depth were different from those at the surface, the zones were assumed not to be representative of the distribution of contamination. Therefore, the composite cores were treated as random samples, each equally representative of the entire surface in the section. In this case, f i was selected in a uniform distribution between 0 and 1, such that the sum of fractions for each zone were equal to 1.
For each V soil,i,j thus defined, a density was selected using mean densities for each depth, choosing an error within a uniform distribution of ±20%, as the hypothesis of a normal or log-normal density distribution was rejected.
Because pollutant analyses were carried out on the fraction <2 mm and the filter media contained a non-negligible fraction of >2 mm particles, which were included in bulk density measurements but unlikely to contain as many pollutants as smaller particles due to their mineral character and low surface area. The mass of soil was therefore corrected by this factor. As the hypothesis of normality was not rejected for this fraction, its value was selected from a normal distribution for each sample (Table A5).
The initial concentration of each pollutant in the soil was estimated by analysing all pollutants in a sample of filter media taken before its installation in the BFS. In the mass balance calculation, an analytical error in the initial concentration of each pollutant was randomly selected for each iteration. When initial concentrations were below the LoQ, initial concentrations were randomly selected in a uniform distribution between 0 and the LoQ.
An additional source of uncertainty is the representativeness of the initial filter media sample for each portion of the BFS due to imperfect mixing of topsoil and sand to produce the initial filter media at the time of construction. As the filter media in this system is composed of topsoil with low calcium content (0.3%) and lime sand containing 38% calcium, calcium may be considered to be a tracer of the sand, which can be used to evaluate the heterogeneity of mixing of the filter media. Calcium content varied by a relative standard deviation of 13.8%, indicating an imperfect homogenization of the two materials during media fabrication (see Appendix D for the calcium distribution). Therefore, a factor of variability f var , equal to the variability of calcium in the media cores sampled in the BFS, was applied to the initial concentration of each pollutant in order to estimate the uncertainty in the initial pollutant mass due to the variability in the filter media's composition. As the hypothesis of the normality of the calcium content distribution was not rejected (Shapiro-Wilk P < 0.05), this factor was randomly selected in a normal distribution with the relative standard deviation of calcium measured in all soil samples.

Micropollutant Load in Road Runoff
The calculated micropollutant loads in untreated road runoff are summarized in Table 3. The median annual Cu load per road surface area (851 g/ha/year) is high compared to most values issued from a review of annual traffic area runoff loads (median 355, with values ranging from 34-3780 g/ha/year), while that of Zn (2360 g/ha/year) is slightly below the median (2600 g/ha/year) issued from the same data set [31].
Annual NP and DEHP loads (7.87 and 77.3 g/ha/year, respectively) are lower than those of 22 and 350 g/ha/year, respectively, calculated for a highway with much higher daily traffic by Björklund et al. [32]. This may be due to traffic differences as well as to a decrease in the use of these compounds following their identification as priority pollutants by the European Union [2]. Both NP and OP (2.15 g/ha/year) annual loads exceed those evaluated for runoff from a residential urban catchment (1.9 and 0.12 g/ha/year, respectively) [33].
The BPA load evaluated per active surface and mm of rainfall (4.92 mg/act.ha/mm) is similar to values calculated by Hannouche et al. [28] for runoff from three different urban catchments in separate storm sewers (4.2-5.9 mg/act.ha/mm), while those of NP and especially OP (18.6 and 5.06 mg/act.ha/mm, respectively) exceed corresponding loads from the same study (12-13 and 0.87-1.0 mg/act.ha/mm). These results are coherent with previous research which has shown tires to be a major source of OP and several automobile components to be sources of NP [34].
Distributions of calculated loads are slightly skewed to the right, with confidence intervals reaching between 13-22% in the negative direction and to 17-41% in the positive direction. Uncertainties in water volume calculations account for about 4% in the negative direction and 6% in the positive direction, indicating that pollutant concentration estimations are the main source of uncertainty. The DEHP load has the highest relative uncertainty, followed by Phen, Pyr, Zn, NP, OP, Cu and BPA.

Micropollutant Load in Road Runoff
The calculated micropollutant loads in untreated road runoff are summarized in Table 3. The median annual Cu load per road surface area (851 g/ha/year) is high compared to most values issued from a review of annual traffic area runoff loads (median 355, with values ranging from 34-3780 g/ha/year), while that of Zn (2360 g/ha/year) is slightly below the median (2600 g/ha/year) issued from the same data set [31].  Annual NP and DEHP loads (7.87 and 77.3 g/ha/year, respectively) are lower than those of 22 and 350 g/ha/year, respectively, calculated for a highway with much higher daily traffic by Björklund et al. [32]. This may be due to traffic differences as well as to a decrease in the use of these compounds following their identification as priority pollutants by the European Union [2]. Both NP and OP (2.15 g/ha/year) annual loads exceed those evaluated for runoff from a residential urban catchment (1.9 and 0.12 g/ha/year, respectively) [33].
The BPA load evaluated per active surface and mm of rainfall (4.92 mg/act.ha/mm) is similar to values calculated by Hannouche et al. [28] for runoff from three different urban catchments in separate storm sewers (4.2-5.9 mg/act.ha/mm), while those of NP and especially OP (18.6 and 5.06 mg/act.ha/mm, respectively) exceed corresponding loads from the same study (12-13 and 0.87-1.0 mg/act.ha/mm). These results are coherent with previous research which has shown tires to be a major source of OP and several automobile components to be sources of NP [34].
Distributions of calculated loads are slightly skewed to the right, with confidence intervals reaching between 13-22% in the negative direction and to 17-41% in the positive direction. Uncertainties in water volume calculations account for about 4% in the negative direction and 6% in the positive direction, indicating that pollutant concentration estimations are the main source of uncertainty. The DEHP load has the highest relative uncertainty, followed by Phen, Pyr, Zn, NP, OP, Cu and BPA.

Micropollutant Load Reduction
Median estimations of the integrated pollutant load reduction by the BFS system over the mass balance period vary between 27-72% depending on the pollutant (Table 4). Load reductions in biofiltration systems are typically higher than concentration reductions due to water volume reduction [6]. In this case, however, load reductions of the full system tend to be lower than median concentration reductions of drained water, which are summarized in Table 4 and have been reported in detail elsewhere [23]. The main reason for this is that even though about 21% (15-24%, 95% confidence interval) of runoff volume is dissipated by the system (likely due to evaporation/evapotranspiration), around 35% of this volume (31-37%) overflows. This high proportion of overflow volume indicates that water quality performance could be improved by increasing the system's water treatment capacity, either by increasing its surface storage volume or by increasing the rate of treatment by increasing its surface area or using a filter media with a higher hydraulic conductivity. Indeed, the hydraulic conductivity of the present system (21 mm/h) is relatively low compared values typically recommended in biofiltration design guidelines (for example, 100-300 mm/h in Australian guidelines [35]).
We also note that much of the overflow occurred during the first three months of operation, when the check dams were not yet installed and exceptional rainfall occurred. Indeed, 253 mm of the 610 mm of rainfall in the full period occurred during these three months (see Figure A1, Appendix A). Rainfall during the first period included two events with 2-year return periods, while no events outside of the initial period exceeded a return period of six months. Measurements made after check dam installation show a lower proportion of overflow, due both to the absence of particularly intense events and to the increased surface storage volume.
As the calculation assumes that the only improvement to overflowed water quality is due to its dilution by rain falling directly on the BFS, the overflow mass accounts for between 20-29% of input mass. As some sedimentation is likely to occur in the BFS before overflow, this term may be overestimated, leading to an underestimation of load efficiency and intercepted mass.
The lowest overflow proportions are observed for TSS; this is likely due to the fact that RR turbidity and therefore calculated TSS, tended to be lower for the high-volume events generating overflow, particularly during May 2016 (before check dam installation, see Figure A1 in Appendix A) when the stock of mobilizable particles on the road surface seems to have been depleted. The highest overflow proportions are observed for pollutants that were less strongly associated with the particulate phase (BPA and OP) and for NP, for which turbidity was not used to estimate unmeasured concentrations as its particulate concentrations were not found to be correlated with TSS. This may be indicative of a bias in these calculations. Indeed, for sampled events during this period, dissolved pollutants were also observed at low concentrations; however, without a relationship to a continuous water quality measurement, concentrations of unsampled events were selected randomly from the total concentration distribution.
Outlet loads from the drain account for between 7 and 51% of inlet loads to the BFS, depending on pollutants. For example, drained TSS accounts for 13% (9-16%, 95% confidence interval) of inlet TSS. At first glance, this value appears high, given that only 45% (41-52%) of runoff was drained from the BFS and the median TSS concentration reduction was found to be 92%. Indeed, event pollutant load reductions would be higher for events with high concentration reductions; however, higher outlet TSS concentrations, corresponding to lower concentration reductions, were observed for some events during a winter period [23]. This inter-event variability in outlet water quality is responsible for the lower load reduction over the full period than those that may be observed for most events. Poor performance was observed for all pollutants during some events in the winter period and/or during the beginning of operation (see E c values in Table 4). As these values are accounted for in the outlet pollutant concentration distributions used to estimate loads, the over or underrepresentation of events with poor performance in the sampled events could be a source of bias in pollutant load calculations. Continuous monitoring of drained water turbidity would be one method for reducing this potential bias, at least for pollutants found to be mainly present in the particulate phase.

Integrated Water and Filter Media Mass Balance
All mass terms associated with water fluxes (M in , M out,drain and M out,over ) and soil (M soil,initial and M soil,final ) are presented in Figure 8. They are also provided numerically in Appendix E. A first observation may be that the soil mass terms, even at the beginning of the study, are much greater than the load terms for Cu, Zn, Phen, Pyr and NP. An important implication of this fact is that the absolute uncertainties in calculated soil masses are large compared to the load terms, sometimes even exceeding the loads entering the system. This may limit the possible interpretations of the mass balance and will be discussed in detail later.
It can also be observed that relative uncertainties in the soil mass balance terms are highly variable between different pollutants. The highest relative uncertainties are observed for pollutants with concentrations in the initial filter media below the LoQ (BPA, OP and DEHP), for which the 95% confidence interval extended (−66, 70%), (−65, 69%) and (−46, 50%) from the medians, respectively. However, as the initial masses of these pollutants were low, associated absolute uncertainties are small compared to other pollutants. Beyond this, relative initial mass uncertainties increased with increasing analytical uncertainties from as little as (−12, 14%) for copper to (−34, 37%) for NP.
Relative uncertainties in the final soil mass vary between pollutants due not only to analytical uncertainty but also to the distribution of each pollutant throughout the filter media (see Figure 5 and Appendix D) compared to that of trace metals at the surface, which determined the algorithm used for evaluating M soil,final . As the distribution of all organic micropollutants often differed from that of trace metals at the surface, the second method, which involves more uncertainty, was frequently employed. Cu and Zn have the lowest relative uncertainties in final mass (−8, 9%) and (−9, 9%), respectively, while the highest are observed for OP and DEHP (−41, 71%) and (−43, 68%). As previously discussed, M in exceeds the sum of M out,drain and M out,over for all pollutants, indicating that a certain mass has been intercepted by the filter media. At the same time, the mass stored in the soil increased for most pollutants over the study period. The intercepted mass of each pollutant (M intercepted ) is compared to the evolution in the mass of pollutant in the soil (∆M soil ) in Figure 9. In a perfectly conservative mass balance, in which all important sources and sinks are correctly represented, these two terms are expected to be equivalent. ∆M soil < M intercepted would correspond to an unrepresented sink for pollutant mass (dissipation), whereas ∆M soil > M intercepted indicates that there is a source of the pollutant that was not accounted for in the mass balance. Results in this comparison vary greatly between pollutants.
Water 2019, 11, x FOR PEER REVIEW 19 of 37

Limitations of the Proposed Mass Balance Approach
When ΔMsoil significantly exceeds Mintercepted, at least one of the following must be true: (a) Msoil,initial and/or Min has been underestimated, (b) Msoil,final, Mout,drain and/or Mout,over has been overestimated and/or (c) important source terms have been ignored. The purpose of this section is to evaluate possible explanations for the ΔMsoil > Mintercepted observations for Zn, BPA, OP, NP and DEHP.

Questionable Representativity of the Initial Soil Sample
For all pollutants, possible errors in soil mass calculations (including analytical uncertainties but also variability in the soil properties throughout the system and the representativity of each sample) have been accounted for in the calculation.
The representativity of the initial soil sample, which was collected in the pile of prepared filter media during construction, might be questionable due to construction methods leading to variations in the filter media composition throughout the BFS. Imperfect topsoil and sand mixing were taken into account by varying initial concentrations throughout the filter according to the relative While the median value of M intercepted is lower than that of ∆M soil for Cu, the confidence interval of M intercepted is completely contained within that of ∆M soil , which is notably larger. These results are thus compatible with the hypothesis that the important terms in the Cu mass balance are accurately represented.
The confidence interval of ∆M soil for Phen and Pyr also encompasses that of M intercepted . However, in this case, ∆M soil is not significantly different from zero, with confidence intervals for both pollutants spanning from negative to positive values. As such, given the level of uncertainty in ∆M soil , it is impossible to draw any meaningful conclusions as to the fate of these pollutants in the soil from the mass balance.
∆M soil is higher than M intercepted for Zn, DEHP, BPA, OP and NP with no overlap of confidence intervals. In this case, although the ∆M soil interval is wider than that of M intercepted , the differences between the masses are so large that they are significantly different. Possible explanations for these observations will be explored in the following section.

Limitations of the Proposed Mass Balance Approach
When ∆M soil significantly exceeds M intercepted , at least one of the following must be true: (a) M soil,initial and/or M in has been underestimated, (b) M soil,final , M out,drain and/or M out,over has been overestimated and/or (c) important source terms have been ignored. The purpose of this section is to evaluate possible explanations for the ∆M soil > M intercepted observations for Zn, BPA, OP, NP and DEHP.

Questionable Representativity of the Initial Soil Sample
For all pollutants, possible errors in soil mass calculations (including analytical uncertainties but also variability in the soil properties throughout the system and the representativity of each sample) have been accounted for in the calculation.
The representativity of the initial soil sample, which was collected in the pile of prepared filter media during construction, might be questionable due to construction methods leading to variations in the filter media composition throughout the BFS. Imperfect topsoil and sand mixing were taken into account by varying initial concentrations throughout the filter according to the relative variability of Ca, a tracer of the lime sand. Another source of uncertainty is the quality of the initial topsoil itself, which may vary within or between truckloads. While it appears unlikely that M soil,initial is biased to such an extent that it exceeds the already large variability allowed for in the mass balance calculation, it is highly recommended that future studies account for the fact that filter media is likely to be heterogeneous. Initial samples should be taken at various locations in the filter media immediately after construction or taken at different moments during filling. Samples could either be analysed separately, which would allow for a characterization of spatial variability or used to form a composite sample.

Overestimation of Outlet Pollutant Loads
One potential bias in the load calculation comes from the estimation of the concentrations in the overflow. As this concentration was not measured directly, it was assumed to be equal to the inlet concentration, accounting for a dilution by direct rainfall. As previously mentioned, the assumptions that no sedimentation occurred at the surface of the biofilter and that water quality was constant throughout the rain event, both used in this estimation, likely lead to an overestimation of M out,drain .
Another source of bias is the representativity of sampled rain events for the full period. While sampled rain events covered various seasons and hydrologic conditions, they also tended to correspond to rather large events. The use of a continuous turbidity measurement in calculating inlet pollutant loads not only decreases uncertainty but also allows the adjustment of inlet concentrations to real observations for each event, minimizing this bias. Indeed, the median inlet masses calculated using the turbidity measurement are slightly lower than those calculated using random sampling from the total concentration distribution (Appendix F). This indicates that the distribution of inlet concentrations for sampled events tends to be higher than that of all events in the period. While this was corrected for inlet loads, a similar continuous measurement was unavailable for outlet concentrations from the drain, which may be a potential source of bias. Although this bias could lead to either an overestimation or an underestimation of pollutant load, in the present case, an overestimation is suspected.
In order to test whether an overestimation of M out,over and/or M out,drain might explain the significant differences between M intercepted and ∆M soil for Zn, BPA, OP, NP and DEHP, we consider the limiting hypothesis that both outlet terms are equal to zero by comparing ∆M soil with M in in Appendix G.
It can be seen that ∆M soil is significantly larger than M in for BPA, OP, NP and DEHP. Therefore, an overestimation of outlet terms cannot explain the fact that ∆M soil > M intercepted for these pollutants; there must be a source of these pollutants which was not accounted for in the mass balance. In the case of Zn, the M in confidence interval intercepts that of ∆M soil , albeit at the lower end, meaning that an extreme overestimation in the outlet terms could explain the difference between M intercepted and ∆M soil . However, as neither M out,over nor M out,drain is likely to be null, it remains probable that another source of Zn is present.

Atmospheric Deposition
Direct atmospheric deposition on the BFS surface was not accounted for in the mass balance calculations and is another possible source of pollutants in the soil. However, as the surface of the BFS is only 4.5% that of the surface which it drains, the mass associated with direct atmospheric deposition would be much smaller than those associated with runoff even if atmospheric deposition was the main source of pollution in runoff. In addition, a parallel study of the same site which characterized the contamination of atmospheric deposition for Cu, Zn, Phen and Pyr showed it to have concentrations an order of magnitude below those found in road runoff (see Appendix H for more details) [36]. Therefore, accounting for atmospheric deposition would not significantly modify the results of the mass balance in the present case.

Galvanized Steel Barrier
A potential source of Zn, which was not accounted for in the mass balance, is a galvanized steel safety barrier that separates the BFS from the road (see Appendix I for a photograph of the barrier). As rain falls on this barrier, it is likely to emit zinc; in addition, a portion of the runoff from the road is in contact with its posts as it enters the BFS. Although a similar barrier is also in place along the roadside where untreated runoff is collected, it is located on the opposite side of the curb and does not have the same amount of contact with runoff. Therefore, inlet loads based on data from the reference catchment do not account for this source, which probably explains at least part of the excess zinc found in the soil.

Degradation of Ethoxylated Alkylphenols
One potential source of alkylphenols (OP and NP) in the BFS is the degradation of alkylphenol ethoxylates (APEO). Indeed, alkylphenols and short-chained APEO (OP 1 EO, OP 2 EO, NP 1 EO and NP 2 EO) are the stable by-products of the degradation of industrially-produced, longer-chained APEO [37], for which tires, plastic vehicle components, paints, lubricant oils and asphalt are all potential sources in road runoff [38]. Therefore, if these longer-chained APEO, which were not analysed in the present study, are present in runoff, their degradation after retention by the BFS is likely to produce NP and OP in the BFS, which may at least partially explain the excess of these contaminants found in the soil.

Biofiltration Swale Construction Materials
The significant excesses of BPA, NP, OP and DEHP found in the BFS soil may also be the result of emissions from various materials used to construct the BFS, including a geomembrane liner, a drain, a fabric surrounding the drain and asphalt. Emissions tests of these materials were conducted in batch experiments and have been detailed elsewhere; the tests showed that all four of these pollutants could be emitted one or more of these materials [12]. Because the geomembrane, drain fabric and drain were located below or surrounding the filter media, these sources may also explain the high concentrations observed in the lower portions of the filter for these pollutants ( Figure 5 and Appendix D).
Insofar as additional pollutant sources come from materials used in the device itself, the high ∆M soil compared to M intercepted indicates that the removal of these pollutants from runoff comes at a high environmental cost in terms of soil pollution. For example, the BFS retains 39% of incoming loads of BPA at the cost of emitting nearly 9 times this amount into the soil. If biofiltration devices are to be part of an integrated strategy for limiting pollution, it is important to limit the use of materials likely to emit the very pollutants being treated. As many organic micropollutants are ubiquitous in industrially-produced, synthetic materials, it is likely to be difficult to find drainage or impermeabilization materials that do not include them.

Necessary Conditions for a Mass Balance Significantly Demonstrating Dissipation
The previous discussion highlights many of the methodological difficulties associated with establishing an in situ mass balance, including high uncertainties, potential biases and uncharacterized sources. The purpose of the current section is to establish a list of necessary conditions for establishing a mass balance capable of demonstrating dissipation with certainty. For the purposes of this discussion, we define an indicator of the proportion of intercepted pollutant mass dissipated in the system, E soil (Equation (9)).
where E soil is the percent difference between the mass of pollutant recovered in the system at the end of the period compared to the mass of pollutant intercepted, ∆M soil is the change in soil mass over the study period and M intercepted is the mass of pollutant intercepted by the biofilter over the study period. A significant dissipation may be observed when E soil is significantly greater than zero. This is not the case for any of the organic micropollutants considered in the present study (see Appendix J), despite the fact that many of the studied species are expected to be degraded in soil environments. Indeed, for all pollutants, E soil is either negative or the uncertainties so large that it is not significant. The necessary conditions for demonstrating a significant dissipation are listed and discussed below, given the lessons from the current study.

Pollutant Loads in and out of the System Have Been Evaluated, Accounting for Uncertainties, for all Important Pollutant Flux Terms
The evaluation of pollutant loads associated with flows in and out of a stormwater biofilter generally requires characterization of both pollutant concentrations and water flows. When not all events in the study period are sampled, pollutant concentrations for unsampled events can be evaluated stochastically. Continuous measurements of turbidity can improve load estimates by correcting for biases in sampled events, at least for particulate pollutants. It is to be noted that continuous measurements require substantial effort for equipment maintenance and data validation.

Both Initial and Final Concentrations in Soil Have Been Evaluated in a Representative Fashion; Soil Properties and Their Variability Have Been Characterized
Obtaining representative soil samples at the beginning and the end of a study presents several challenges. In the initial phase, it should not be assumed that a filter media is perfectly homogenous as it may vary in composition due to imperfect mixing of initial materials (such as topsoil and sand). Soil properties, such as density, may also vary spatially and over time; it is recommended that several measurements be taken at the beginning and at the end of the study period in order to account for potential variability.

No Pollutant Sources are Present within the System or the Dissipation of a Given Pollutant Exceeds Its Production/Emission (in the Second Case, Dissipation Will Be Underestimated but May Still Be Identified)
A major result of the present study is that emissions by synthetic materials within or around a biofiltration device (such as asphalt, drains, drain fabric or geomembranes) may exceed loads associated with runoff. Therefore, ideally, no such materials should be used in systems treating organic micropollutants, especially where a mass balance of organic micropollutants is to be carried out. However, in order to evaluate outlet concentrations, it is necessary to drain water from the filter and in order to measure the full outlet loads from a system, it must be lined to prevent exfiltration. It is therefore advisable to test various materials in order to choose the least polluting one. A longer study duration may also minimize the significance of these emissions in the mass balance, both because emissions are likely to be highest from new materials and because emitted pollutants may degrade over time.
In addition, some pollutants may be the products of the degradation of others. When this is the case, as for alkylphenols, it is unlikely that dissipation can be demonstrated unless the mass balance is expanded to include possible parent compounds, of which there may be many.

Intercepted Pollutant Masses Are Significant Compared to Initial Soil Pollutant Mass
In order for E soil to be significantly greater than zero (indicating dissipation), its relative uncertainty must be less than 100%, which was rarely the case in the present study (see Appendix J for all relative uncertainties). A better comprehension of the propagation of uncertainties between terms is therefore important for understanding how this may be achieved.
Assuming the errors in load calculations to be independent from those in the soil calculations, the Equation (10), which relates relative errors in E soil to those in ∆M soil and M intercepted , can be derived from the rules of uncertainty propagation [39].
when E soil is positive (as is the case if dissipation occurs), the condition δ rel,E soil < 1 may be applied to this equation to find the necessary conditions for demonstrating E soil to be significantly greater than zero. Substituting the definition of E soil , in terms of ∆M soil and M intercepted , which are also assumed to be positive, Equation (11) may be obtained. This inequality demonstrates that the larger the relative uncertainties in ∆M soil and M intercepted , the larger the difference between M intercepted and ∆M soil must be to be significant. Therefore, dissipation will be more difficult to prove when these uncertainties are large.
Because M intercepted and ∆M soil are both differences between other mass balance terms, their absolute uncertainties are related to those of the terms used to calculate them. Their relative uncertainties become large when they are small compared the terms used to calculate them (when there is a small difference between two large terms). Therefore, relative uncertainty in ∆M soil will become large when M soil,initial is much larger than ∆M soil . As by definition ∆M soil must be smaller than M intercepted for dissipation to occur, we note that the condition in Equation (11) is unlikely to be fulfilled when M intercepted is much smaller than M soil,initial due to divergence in ∆M soil 's relative uncertainty. Uncertainties in M intercepted itself will also become large if it is small compared to M in due to poor retention, making Equation (11) even harder to fulfil. Besides the efficiency of retention, the magnitude of M intercepted compared to M soil,initial also depends on the level of contamination initially in the filter medium, the annual loads of pollutants arriving in the system and the duration of the study.
In the present study, it would have taken 2 years for M intercepted to approach M soil,initial for BPA and OP, 6 to 10 years for Cu, Zn and DEHP and at least 19 years for Pyr, Phen and NP. The long duration required for the latter molecules indicates that the soil was initially quite contaminated. Indeed, the initial concentration of NP (0.159 µg/g) was higher than most measurements from a study of 32 soil samples from Ile-de-France, while those of Pyr and Phen (0.146 and 0.072 µg/g, respectively) were in the typical range for the region [40].
The divergence of relative uncertainties is best illustrated by Pyr and Phen (Appendix K), pollutants for which high concentrations were observed in the initial soil sample and for which no sources besides runoff were apparent. In this case, although initial and final soil mass uncertainties were at most ±35%, uncertainties in ∆M soil reached (−145, 147%) for Pyr and (−269, 310%) for Phen, leading to uncertainties in E soil of (−116, 119%] and (−457, 382%), making it impossible to make any conclusions as to the fate of these pollutants in the soil.
This example underlines the importance of adapting the duration of a study to initial pollutant mass and observed pollutant loads when a mass balance is to be undertaken. It should be noted that this duration can be minimized by using a filter media prepared from materials with low initial concentrations of all studied pollutants (which is also good practice for system design) and by locating the study in a catchment where high runoff pollution is expected. However, it must be noted that it is difficult to evaluate exact duration necessary a priori as estimations of M soil,initial and annual M intercepted require knowledge of the system and that, in practice, extending the duration of a study may not always be feasible.

Conclusions
A stochastic method for estimating the in situ mass balance of micropollutants in a stormwater biofilter was developed and applied. This method is useful for demonstrating integrated pollutant load reductions over a given period and has the potential for improving understanding of pollutant fate in the filter media under real conditions. In the present case, it has shown the significance of micropollutant emissions from construction materials.
However, demonstrating the dissipation of micropollutants in stormwater biofilters using a mass balance presents various methodological difficulties which may be difficult to resolve in a field context. Beyond the challenges associated with acquiring representative water and soil samples and reliable hydrologic measurements, studies must account for high uncertainties associated with all mass terms. Uncertainty may be limited by ensuring that initial pollutant concentrations in the filter media are low (this is possible only when the system is designed for the study) and by extending the duration of the study to span several years; however, such a study would be costly as at least continuous measurements and ideally water sampling should be continued throughout the period to reliably characterize pollutant loads. In addition, in an uncontrolled field context, studies may be confronted with unexpected sources of pollutants which might falsify the mass balance. These sources could include construction materials, as observed in the currently study or pollution from isolated events such as car accidents or illicit waste dumping. In some situations, the objective of demonstrating dissipation in situ through a mass balance may prove to be something of a white whale, requiring a long and intensive study with a high risk of failure due to unrepresented inputs.
Besides the in situ mass balance approach, the extent of dissipation of organic micropollutants may be studied under controlled experiments. However, the representativity of such controlled experiments has to be ensured, for example by regularly applying real runoff water to outdoor mesocosms. In addition, the ability of a particular site to degrade pollutants might be characterized by studying the presence of known metabolites or by characterizing the strains of microorganisms present. Evaluating the evolution of the signature of pollutants in the filter media over time may also be a method for demonstrating the dissipation of some species relative to others.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript or in the decision to publish the results.  Table A1. Summary of distributions used to account for analytical errors.   Table A3. Summary of event mean concentration (C ev ), event mean dissolved concentration (C D,ev ) and event mean particle concentration (S ev ) distributions used to estimate inlet concentrations for unsampled events for each pollutant. C D,ev,in and S ev,in distributions are not given for NP as turbidity was not used for this pollutant.  Table A4. Summary of event mean concentration (C ev ), event mean dissolved concentration (C D,ev ) and event mean particulate concentrations (C P,ev ) distributions used to estimate concentrations in drained water for each pollutant.

Appendix D: Distributions of species in composite cores
The distribution of different species among composite cores may be observed in Figure D1-7. The Ca distribution ( Figure D1) is mainly representative of the proportion of sand in the filter media. One may observe that calcium concentrations near the surface are closest to those observed in the initial soil sample. Samples closer to the bottom of the filter contain more calcium. This may be due to imperfections in the filter media production process, which was carried out in batches, such that different depths of the filter correspond to different batches and thus slightly different compositions.
For Cu (Figure 5a), Zn, Phen and Pyr ( Figures D2-4), the highest concentrations tend to be observed at the surface of each zone, as would be expected for pollutants for which the main source is infiltrating water. The one exception to this is the sample from middle depth of the C2 zone for Phen and Pyr ( Figures D3,4), which had the highest concentration despite not being at the surface. When it was constructed, a strip of asphalt was installed to guide runoff into the BFS. At the time of sampling, some of the asphalt located near the C2 zone had crumbled into the soil along the BFS. It is possible that some pieces of asphalt were included in this sample, which would explain this higher concentration, though it remains surprising that it was not located at the surface.
Distributions of BPA, OP, NP and DEHP (Figure 5b and Figures D5-7) were less predictable, with higher concentrations often observed at greater depths, indicating that infiltrating water is not

Appendix D Distributions of species in composite cores
The distribution of different species among composite cores may be observed in Figures A3-A9. The Ca distribution ( Figure A3) is mainly representative of the proportion of sand in the filter media. One may observe that calcium concentrations near the surface are closest to those observed in the initial soil sample. Samples closer to the bottom of the filter contain more calcium. This may be due to imperfections in the filter media production process, which was carried out in batches, such that different depths of the filter correspond to different batches and thus slightly different compositions.
For Cu (Figure 5a), Zn, Phen and Pyr ( Figures A4-A6), the highest concentrations tend to be observed at the surface of each zone, as would be expected for pollutants for which the main source is infiltrating water. The one exception to this is the sample from middle depth of the C2 zone for Phen and Pyr ( Figures A5 and A6), which had the highest concentration despite not being at the surface. When it was constructed, a strip of asphalt was installed to guide runoff into the BFS. At the time of sampling, some of the asphalt located near the C2 zone had crumbled into the soil along the BFS. It is possible that some pieces of asphalt were included in this sample, which would explain this higher concentration, though it remains surprising that it was not located at the surface.
Distributions of BPA, OP, NP and DEHP (Figures 5b and A7, Figures A8 and A9) were less predictable, with higher concentrations often observed at greater depths, indicating that infiltrating water is not the only source of these pollutants. Besides the previously mentioned asphalt, other potential sources of micropollutants which may explain an accumulation of these molecules at greater depths are the geomembrane, the drain and the fabric surrounding the drain. The middle sample in the C2 zone also demonstrated high concentrations of OP, NP and DEHP, indicating that the source of PAHs present in this sample was also a source of these pollutants.
In each section (A, B and C), the only pollutants which were nearly always present at the highest concentrations in the most polluted zones from the surface cartography were Cu and Zn. Even for these pollutants, A3 was found to be slightly more contaminated than A2. This indicates that the sources, removal mechanisms or fate processes of the other pollutants differed sufficiently from those of trace metals to lead not only to different vertical distributions but also to different horizontal distributions in the soil. It also means that the choice of sampling zones was not ideal for these pollutants and that the second algorithm for calculating soil concentrations was often used.
It may also be observed that while the initial concentration of most pollutants aligns with the lowest measured concentrations in the composite cores, this was not the case for Phen and Pyr. One explanation for this is that the PAHs initially present have been partially degraded in soil. However, this hypothesis is not supported by the fact that the deficit of Pyr at the end of the study is greater than that of Phen, despite the fact that Pyr is generally less easily biodegraded (the Q 20 -Q 80 of biodegradation half-lives range from 11-64 days for Phen and 210-1900 days for Pyr [41]). Another possible explanation is a variability in initial concentrations throughout the filter media; in this case, the initial PAH concentrations in most of the BFS may have actually been different than that in the initial soil sample. While possible variability in the initial concentration due to poor mixing top-soil and sand to produce the filter media was taken into account in the calculation, potential variations in the quality of the top-soil were not. Two composite core samples of NP presented concentrations below that in the initial soil sample. It should be noted that BPA, OP and DEHP were present at concentrations below the LoQ in the initial soil sample, so no lower concentrations could be observed. possible explanation is a variability in initial concentrations throughout the filter media; in this case, the initial PAH concentrations in most of the BFS may have actually been different than that in the initial soil sample. While possible variability in the initial concentration due to poor mixing top-soil and sand to produce the filter media was taken into account in the calculation, potential variations in the quality of the top-soil were not. Two composite core samples of NP presented concentrations below that in the initial soil sample. It should be noted that BPA, OP and DEHP were present at concentrations below the LoQ in the initial soil sample, so no lower concentrations could be observed.
.  . Figure D1. Concentrations of calcium measured in composite cores of each sampling zone. Within each zone, concentrations from cores taken at depths of 0-5 cm, 5-15 cm and 15-50 cm are represented to scale from left to right. The dotted line represents the concentration measured in the initial soil sample.              Figure A10 shows the difference between inlet masses calculated with and without turbidity data. The use of turbidity tends to slightly decrease uncertainties, reducing the confidence interval, for example from (−15, 23%) to (−13, 22%) for copper. In addition, masses calculated using turbidity data were lower than those evaluated without turbidity data. This is likely due to the fact that the distribution of event mean turbidity measurements for sampled events were slightly higher than those of the full period. This indicates that particulate concentrations and total concentrations of mainly particulate pollutants, were also likely higher for sampled events than for the full period. The use of turbidity data allows for the correction of this bias.

Appendix F Comparison of Inlet Masses with and without Turbidity Data
As turbidity improves estimations of particulate concentrations and mass, it does little to change masses calculated for more dissolved pollutants, like BPA ( Figure A10e). There is no change in inlet mass calculations with and without turbidity data for NP, as particulate concentrations were not correlated with TSS concentrations; therefore, turbidity was never used in the algorithm for calculating NP input mass.
Water 2019, 11, x FOR PEER REVIEW 33 of 37 Figure F1 shows the difference between inlet masses calculated with and without turbidity data. The use of turbidity tends to slightly decrease uncertainties, reducing the confidence interval, for example from (−15, 23%) to (−13, 22%) for copper. In addition, masses calculated using turbidity data were lower than those evaluated without turbidity data. This is likely due to the fact that the distribution of event mean turbidity measurements for sampled events were slightly higher than those of the full period. This indicates that particulate concentrations and total concentrations of mainly particulate pollutants, were also likely higher for sampled events than for the full period. The use of turbidity data allows for the correction of this bias.

Appendix F: Comparison of inlet masses with and without turbidity data
As turbidity improves estimations of particulate concentrations and mass, it does little to change masses calculated for more dissolved pollutants, like BPA ( Figure F1e). There is no change in inlet mass calculations with and without turbidity data for NP, as particulate concentrations were not correlated with TSS concentrations; therefore, turbidity was never used in the algorithm for calculating NP input mass.