SIPIBEL observatory: Data on usual pollutants (solids, organic matter, nutrients, ions) and micropollutants (pharmaceuticals, surfactants, metals), biological and ecotoxicity indicators in hospital and urban wastewater, in treated effluent and sludge from wastewater treatment plant, and in surface and groundwater

The Bellecombe pilot site – SIPIBEL – was created in 2010 in order to study the characterisation, treatability and impacts of hospital effluents in an urban wastewater treatment plant. This pilot site is composed of: i) the Alpes Léman hospital (CHAL), opened in February 2012, ii) the Bellecombe wastewater treatment plant, with two separate treatment lines allowing to fully separate the hospital wastewater and the urban wastewater, and iii) the Arve River as the receiving water body and a tributary of the Rhône River and the Geneva aquifer. The database includes in total 48 439 values measured on 961 samples (raw and treated hospital and urban wastewater, activated sludge in aeration tanks, dried sludge after dewatering, river and groundwater, and a few additional campaigns in aerobic and anaerobic sewers) with 44 455 physico-chemistry values (including 15 pharmaceuticals and 14 related transformation products, biocides compounds, metals, organic micropollutants), 2 193 bioassay values (ecotoxicity), 1 679 microbiology values (including microorganisms and antibioresistance indicators) and 112 hydrobiology values.


Related publications
Selected publications presenting and/or using the SIPIBEL data: In addition, all publications and SIPIBEL project deliverables, in French or English, are listed in the SIPIBEL website at http://www.graie.org/Sipibel/publications.html (accessed 05 July 2021). Note that the SIPIBEL website is in French, except the welcome page at http://www.graie.org/Sipibel/anglais.html .

Value of the Data
• The very large SIPIBEL data set contains values of numerous indicators (usual physicochemical indicators of water pollution, metals and chemical elements, pharmaceuticals, PAHs, biocides, bioassays, integrons and antibiotic resistance, etc.) measured, in a coordinated way, in (i) hospital and urban wastewater collection systems, (ii) a wastewater treatment plant (both water and sludge treatment lines), and (iii) receiving water bodies at local and regional scales, allowing a global anaysis of concentrations and loads from emissions to discharges into the environment. The interest of this data set also lies in its spatio-temporal dynamics with an evolution of the successive configurations of the wastewater treatment plant ( Fig. 2 ) evaluated over 4 years of monitoring. • In particular, these data contribute to a better assessment of (i) concentrations and loads of pharmaceuticals and biocides in urban and hospital wastewater, (ii) their removal by wastewater treatment plants, (iii) the validation of indicators like integrons or antibioresistance in environment, and (iv) the influence of exposome on resistome. • These data can be used by researchers working on pharmaceuticals and biocides in wastewater, in treament plants and in receiving surface water bodies. They can also be used by wastewater treatment plant staff, consultants, or water utilities to design future treatment processes, and also by regulators to inform their decisions about new policies and regulations related to pharmaceuticals and biocides in wastewater and in the environment. For example, these data have been used to establish correlation by machine learning system or to model the antibioresistance dissemination. • The data can also be used in modelling works, and in international reviews and comparisons.
The SIPIBEL observatory provides data on both (i) conventional water quality parameters, and (ii) pharmaceuticals, transformation products and surfactants presence, and (iii) biological indicators that enable the long-term evaluation of risks for the environment and health: • Physico -Chemistry (PC): • Each value in the dataset is associated with metadata: sampling point, analytical or measurement method, LoD (limit of detection), LoQ (limit of quantification), validation mark, date and time of sampling campaigns, etc. Discharges at some sampling points are also provided to calculate loads.
In case of substances detected at concentrations lower than their limit of quantification ( < LoQ) or lower than their limit of detection ( < LoD), the following substitutions were applied, respectively, according to the frequency of quantification (FQ) of the substance [3] : If FQ < 25%, < LoQ was substituted with 0 and < LoD was substituted with 0. If 25% < FQ < 50%, < LoQ was substituted with LoQ/4 and < LoD was substituted with LoD/4. If 50% < FQ < 75%, < LoQ was substituted with LoQ/2 and < LoD was substituted with LoD/2. If FQ > 75%, < LoQ was substituted with LoQ and < LoD was substituted with LoD.
In addition, for very rare detections, if less than 5 values have been detected and measured, < LoQ was substituted with 0 and < LoD was substituted with 0.

General description
SIPIBEL is located in the Arve River basin, in France, close to the French -Swiss border ( Fig. 1 ). The pilot site includes the following elements: • The CHAL hospital, opened in February 2012, with 450 beds, 8 surgery rooms, and various departments, including emergency, oncology, nuclear medicine diagnosis, internal medical labs, pharmacy, and kitchen. • The Bellecombe activated sludge WWTP, with the possibility to treat either separately or jointly the CHAL wastewater and the urban wastewater from the Bellecombe urban catchment (approx. 21 0 0 0 inhabitants). • The Arve River, downstream the Bellecombe WWTP, flowing from the French Alps into the Rhône River. The Geneva aquifer is used for drinking water production by both France and Switzerland. As the groundwater intake is greater than the natural recharge, the Vessy station injects Arve River water after river bank filtration to recharge the aquifer (the mean annual injected volume is approximately 9 million m 3 , which is significant as it amounts to 60% of the annual groundwater intake for drinking water production).
The initial capacity of the Bellecombe activated sludge WWTP was 5 400 PE (aeration tank 1 280 m 3 , treatment line F2). It was enlarged for the first time in 1995 with a second basin (10 600 PE -2 720 m 3 , tratment line F1). In 2009, a third basin was constructed (16 0 0 0 PE -4 0 0 0 m 3 , treatment line F3) which led to a total capacity of 32 0 0 0 PE. The last extension was performed due to the connection of the CHAL hospital. The urban combined sewer network connected to the WWTP collects the urban wastewater (UWW) of approximately 21 0 0 0 inhabitants. The hospital wastewater (HWW) was initially estimated at 2 0 0 0 PE. It is transferred without specific pretreatment (except iodine decay tanks located in the CHAL basement for the See Table 1 for detailed GPS coordinates. treatment of radioactive urine from a few rooms with patients treated for cancer) by a separate sewer system to the WWTP. The Bellecombe WWTP is equipped with pre-treatment bar screens and aerated grit chambers. The wastewater then enters into basins with activated sludge operating sequentially under aerobic and anoxic conditions. Subsequently, the treated wastewater is pumped into a final clarifier for sludge separation before discharge into the Arve River.
The unique configuration of the Bellecombe WWTP with two independent parallel treatment lines provides appropriate conditions for treating and studying HWW and UWW separately. HWW can be treated either mixed with UWW in all three basins, or separately in the dedicated smallest basin (5 400 PE) while UWW is treated in the two other basins. Similarly, the sludge treatment can be carried out separately for the HWW or mixed with the sludge from the UWW (see the successive configurations of the WWTP in Fig. 2 ).
During the days of experimental campaigns: • The measured HWW daily volume ranged from 48 to 408 m 3 /d with a mean value of 176 m 3 /d: when treated separately in the 1 280 m 3 aeration tank, this corresponds on average to a 13.75% hydraulic load and to a 7.3 day residence time. • The UWW mean daily volume measured during dry and low rain days of experiments ranged from 2 897 to 13 504 m 3 /d, with a mean value of 5 322 m 3 /d corresponding to a 79% hydraulic load and a 1.3 day hydraulic residence time.
For the year 2013, according to internal data provided by the Syndicat des eaux des Rocailles et de Bellecombe (operator of the wastewater treatment plant, partner of SIPIBEL), compared to the dry weather design capacity, the hydraulic load was 81.5% and the pollutant loads were the following ones: 63% for COD, 56% for BOD 5 , 54% for TSS and 51% for TKN.
In addition, supplementary campaigns have also been carried out in sewer systems, to detect and quantify the possible degradation of pharmaceuticals in wastewater during its transfer to the downstream WWTP: • In the gravity aerobic sewer Choully Galery (UWW), upstream the Geneva WWTP.
• In the anaerobic pressure main from the Pole de Santé hospital (HWW) in Arcachon, France, upstream the Teste-de-Buch WWTP.

Configuration of sampling points
The configuration of sampling points is shown in Fig. 1 , Fig. 2 and Table 1 .
The construction of the CHAL hospital (built to replace the former Annemasse hospital located in another catchment) and its separate connection to the Bellecombe WWTP made it possible to sample effluents in different configurations, according to the WWTP evolution and adaptation during the sampling period ( Fig. 2 ). Four configurations of the WWTP have been set during the sampling period: • Configuration 1 (from 2011 to February 2012, before opening of the CHAL hospital): the Bellecombe WWTP treated only urban wastewater. In addition to samples collected at the WWTP and in the Arve River, wastewater samples were also taken at the former Annemasse hospital (to get prior reference values). This configuration is referred to as the "zero state".

Sampling campaigns
Numerous campaigns were carried out at different sites and sampling points indicated in Figs. 1 and 2 (and also in Table 1 ): • Urban and hospital wastewater at Bellecombe WWTP: • 40 campaigns on raw wastewater, treated wastewater, and activated sludge from February 2011 to December 2016 (usually one 24h campaign/month from the opening of the CHAL hospital in February 2012). The total number of data (48 439 values) uploaded on Zenodo are detailed in Table 2 .

Sampling protocol
The aim of the project being to measure trace concentrations of micropollutants, a special attention was devoted to sampling and sample handling processes to ensure the best possible representativeness of every sample analysed in the laboratory. In order to benefit from the experience of previous similar experiments, the SIPIBEL sampling protocol followed the recommendations of the Aquaref Operational Technical Guide [2] : • 24 h mean samples • WWTP: volume proportional sampling (150 to 200 elementary samples).
• Surface freshwater: mean sample reconstituted from hourly sub-samples, mixed proportionally to hourly river discharge measurements supplied by EDF utility (electricity provider which regulates the flow in the river) and by Etat de Genève (144 elementary samples: 6/hour). • Homogenisation and distribution of a sample in flasks: stirring pale, distribution pump, three partial filling of 1/3 of the flasks. • All equipments in contact with samples are made of glass and Teflon.
• Rinsing of the equipments between every campaign, following a rigorous protocol.
• Blanks to control the reliability of the protocol and validate / correct the measured data.
A log-book was used to report all details and possible incidents during the campaigns and the sampling periods.

Assessment of data quality
Data quality is assessed for each measured value according to the quality of i) the sample, ii) the analysis, and iii) the results of blank samples.
The sample quality is declared: • Correct, if all seven above indicators are correct.
• Incorrect, if at least one of above indicators is incorrect.
• Uncertain, in any other case. • Analytical quality is assessed based on the comments given by the laboratories in their analyses reports. For example, the quality is declared "Uncertain" in case of a possible contamination of the sample, a too long period between sampling and analysis (possibly unsatisfactory preservation of the sample before analysis). • Blank samples make it possible to assess the reliability of the measurement (for example adsorption or desorption phenomena leading to overestimation or underestimation of concentration): the protocol compares the values of blank samples "before" and "after" the sampling and analytical chain, in order to detect and quantify any significant difference beyond measurement uncertainties. Corrections were made if necessary. Let x 1 and x 2 be the values "before" and "after" the sampling chain, and u ( x 1 ) and u ( x 2 ) their respective standard uncertainties, as given by the laboratory. It is assumed that the true value of x i has an approximately 95% probability to be between x i -2 u ( x i ) and x i + 2 u ( x i ) when the x i values are normally distributed. One calculates the gap E between the two values, and its standard uncertainty u ( E ): One concludes that: • If E ≤ 2u ( E ): the two values are not significantly different and can be considered as equivalent.
• If E > 2u ( E ): the two values are significantly different.
Blank checking quality is declared: • Correct if: • There is no significant difference between values as explained above.
• The difference is significant but lower than 15% of the measured value of the samples. • Uncertain if the difference is significant and in the range of 15% to 25% of the measured value of the sample. • Incorrect if the difference is significant and larger than 25% of the measured value.
Finally, data quality is declared: • Correct, if sample quality, analysis quality and blank checking quality are all correct.
• Incorrect, if sample quality, analysis quality and/or blank checking quality is/are incorrect.
• Uncertain, in any other case.
During the project, a dedicated database, named DoMinEau and built with Excel files, was established to collect and centralise all data and make them available for all partners of the project. The final values of correct and uncertain data are made publicly available in Zenodo (see Data accessibility in Section 1 ). Incorrect data have been removed: they correspond to approximately 5% of all data stored in DoMinEau.