Summary of data from the UKWIR chemical investigations programme and a comparison of data from the past ten years' monitoring of effluent quality

5


Introduction
Under the Water Framework Directive (WFD), according to Article 5 of the Directive 2008/105/EC on Environmental Quality Standards (amended 2013/39/EU) Member States (MS) are required to report an inventory of annual emissions, discharges and losses of priority substances.The inventories should give information on the relevance of priority substances at the spatial scale of the River Basin District (RBD) and on the loads discharged to the aquatic environment.This provides information on the success of measures to reduce emissions, meet environmental quality standards (EQS) and indicate whether further efforts may be needed to deliver good chemical status of surface waters.Such emission inventories can only be generated if sufficient data exist for major sources of priority chemicals to water.EQS have been derived and implemented at a European Union (EU) and Member State (MS) level, for over 50 priority chemicals including trace metals, pesticides, solvents and numerous persistent organic pollutants.Results of the first reporting exercises (2nd River Basin Management Plan (RBMP) cycle) indicated difficulties associated with the consistency, completeness and quality of reported emission data (Giakoumis and Volulvoulis, 2018).The first inventory was incomparable between MS.For most substances, MS did not report point source emissions for the following reasons: • substances were identified as not relevant or even only of minor relevance at RBD level.In that case, according to the recommendations of the guidance, only river loads at the RBD level are required, • there is still a lack of reliable point source data emission factors.
Point sources such as urban wastewater treatment works (WwTW) and industrial dischargers can be important sources for emissions to water.In particular, the urban wastewater system collects a variety of chemicals coming from many different sources in urban areas (detergents, personal care products, pharmaceuticals and plumbing materials from domestic sources; road, tyre and brake abrasion and combustion products from traffic; leaching from facade coatings etc.).For quantifying input concentrations and loads, reliable monitoring data are needed, but there still is a lack of data and information for many substances.
The main reasons being: • most chemicals are not included in national routine monitoring programs, • often very low environmental concentrations and low concentrations in wastewater (effluent), • the need for sensitive analytical methods: low limits of detection (LoD) and quantification (LoQ).
The need for data to inform the regulatory decision-making process is usually the spur for increased attention to monitoring (Hope et al., 2012;Kolpin et al., 2002;Martin Ruel et al., 2012).The wide range of chemicals involved however, (González et al., 2011;Kolpin et al., 2002) and the analytical difficulty of working at ng/l levels in a complex matrix such as wastewater makes this a challenging proposition.
The Chemical Investigations Programme (CIP) has been established by the UK Water Industry in response to emerging legislation on surface water quality.This major (£200 million) programme is intended as a means of gaining a better understanding of the occurrence, behaviour and management of trace contaminants in the wastewater treatment process and in effluents.An initial phase (CIP1), carried out principally between 2010 and 2011, made it possible to prioritise substances for which regulation had been introduced.The second phase (CIP2, 2016-2019) aimed to quantify compliance risk at a site-specific level in order that approaches to suitable remedial action could then be considered.
The sponsors of the CIP have already published a comprehensive set of reports dealing with the main aims of the project and have provided a detailed interpretation of its results.These outputs are understandably directed towards the presentation of overall project outcomes, their implications for the water sector and the issue of compliance with water quality standards, (UKWIR (2014(UKWIR ( , 2018)), Gardner et al. (2012), Gardner et al., 2013, Comber et al. (2019).An archive of CIP2 results (as reported by the participating organisations, containing effluent, influent and environmental data) has also recently been released into the public domain (UKWIR, 2021).This archive represents a major information resource consisting, as it does, of several million data points, dealing with over 70 substances, on a national basis, over a period of five years.Whilst being comprehensive and transparent, this "raw data" archive, might appear somewhat daunting to the more casual inquirer.
In order to facilitate access to more concise information we consider it to be of value to future users of the CIP dataset to publish in this paper an overall summary reference dataset of effluent data quality from CIP1 and CIP2.We also provide a unique comparison of effluent quality at a set of specific wastewater treatment works' (WwTW) sites and discuss various factors that might contribute to changes that are evident in the data.

The CIP programmes
The core objective of the CIP programmes was to determine concentrations of priority chemicals entering and leaving WwTW (with upstream and downstream river samples included in CIP2).WwTW were not selected at random, because it was a risk-based exercise, WwTW with the least dilution (and therefore likely to be the highest risk to receiving waters) were selected.However, comparing profiles of size of works, type of treatment and geographic distribution, showed they were representative of the typical WwTW found in the UK, and further afield.The 750 WwTW effluents were characterised across the two programmes (with 75 WwTW duplicated across the 2 programmes), amounted to around 10% of all WwTW in the UK.The CIP was split into phases relating to funding and developing objectives based on knowledge generated by ongoing data.The CIP1 programme ran from 2010 to 2015 monitoring 162 WwTW effluents from England, Scotland and Wales over a period of approximately 18 months with the collection and analysis of either 14 or 28 samples per site over this period.This split in sampling rates was approximately equal (i.e., approximately 80 sites for the lower rate and 80 for the higher).The reason for this was largely to achieve a compromise between dealing with a larger number of sites and avoiding an unsustainable analytical workload.Analytical targets for limit of detection, precision of analysis and spiking recovery were specified to meet the main project aims of providing an accurate picture of effluent quality in relation of current river quality standards.Six laboratories took part, serving the needs of different water utilities (monitoring was organised by water utilities each operating within its own operational region); these laboratories were required to provide evidence that they could meet the specified requirements.
CIP2 was set up slightly differently, owing to the need to expand the number of sites investigated, on the basis of producing a more robust regulatory assessment.Issues of resources available for sampling and analysis led to the programme being scheduled over four tranches of work that were undertaken primarily in successive years from 2016 to 2019, each tranche involving approximately 150 WwTW sites across England and Wales (605 WwTW sampled in total).Sampling of WwTW effluents involved 20 samples taken at approximately fortnightly intervals.Minor adjustments were made in the analytical program developed from CIP1.These involved the requirement of improved limits of detection for metals including cadmium and mercury and the addition to the programme of further trace substances of more recent interest (e.g., two fluorocarbons, hexabromocyclododecane and cypermethrin).Changes in the subprogramme relating to pharmaceuticals were also made.In spite of this there was a good deal of overlap in the two programs which offers the potential for a worthwhile comparison between the resulting two data sets.Details of the project design are provided in the Electronic Supplementary Information (ESI) S1.

Sampling, analysis and quality control
For both programmes, a stratified/random spot sampling basis was employed (i.e.grab samples taken at relatively evenly spaced times rather than multiple integrated sampling).A minimum of 15% of sampling was undertaken in non-working hours (evenings and weekends).
Samples for the determination of metals were collected with polyethylene samplers, filtered (0.45 μm) on-site then acidified and stored in polyethylene (samples for mercury determinations were stored in glass or PTFE and preserved with acid dichromate (Feldman, 1974)).Samples for the determination of trace organic substances were collected with stainless steel samplers, stored in glass and transported at 4 °C to the laboratories.
All data in the tables have been subjected to rejection of statistical outliers using the median absolute deviation z-score method as described in the NIST engineering handbook (NIST, 2021).Individual results reported as less than the required limit of detection (LOD) were substituted with a value 1/2 the reporting limit as specified in EU reporting regulations (EC, 2009).
Further details on sampling, analytical performance and quality control are provided in (ESI, S1).

Results
Tables 1 and 2 present sets of summary data for the two phases of the CIP (CIP1 and CIP2).The summary statistics relating effluent quality for mean, standard deviation and percentiles from 5 to 95th percentiles are presented to record the principal indications of location and range for each of the CIP determinands.Table 2 lists the relevant statistics for all four tranches of CIP2.
The table shows the proportions of <LOD values as shaded rows.This, in itself, does not necessarily indicate poor quality data, provided (as was the case for the great majority of results), that the LODs achieved were in accord with the specified use-based requirements.This is because achievement of the required LOD was set at a level consistent with the assessment of likely compliance risk associated with the relevant discharge.However, the ability to estimate actual concentrations at lower percentiles and variance values becomes more limited the greater the proportion of nondetect values there are.A threshold of >40% less than values is indicated in the tables as a limit at which caution might need to be applied -in line with previous recommendations on this topic (Gardner, 2012).

Comparisons between the CIP1 and CIP2 data sets
A primary area of interest within a programme of the size and scope of the CIP is the possibility that there might have been a been a real change in effluent concentration; perhaps in response to remedial controls with respect sewer inputs, improvements in treatment technology or wider control of substance use.However, there are a number of extraneous, potentially factors which also can come into play, thereby confusing the issues of primary interest.These factors include: I. Selection bias, initially and later in the programme.This refers to the possibility of important differences in effluent quality at the sites selected to be monitored in the first place and then subsequently.The question of the extent to which the initial sites chosen are representative of all WwTW sites has been addressed in the main part of the programme and shown to be satisfactory.However, in the case of the CIP an important step change between CIP1 and CIP2 was driven by the need to increase the number of WwTW sites (from around 160 to over 600) monitored as a demonstration of wider coverage in order to provide a clearer demonstration of the pollution control issues that might have to be faced.This poses potential problems in any attempt to evaluate changes in effluent quality of comparing two very differently sized sets of data that might not involve directly compatible sets of sites.II.Sampling and analytical biases caused by changes in the detailed approaches to sampling and analysis.These biases are almost inevitable since, over an eight-year period, the available laboratory facilities and the analytical state of the art itself are likely to have improved.Consequently a complex proficiency testing regime was put in place to demonstrate data quality and fitness for purpose.There were cases (cadmium and mercury) where improvements in limits of detection would have provided higher quality data (and some improvements were completed in CIP2 as a result).In other cases (notably silver in CIP1) the widespread lack of data greater than the LOD merely demonstrated that water quality compliance was very unlikely to be a problem.III.Temporal bias -caused by possible variations in sewer flow owing to differences in rainfall and run-off (for instance 2016 was a markedly, 1.6×, wetter winter than average).Although weather data are available the extent to which rainfall impacts each individual works was not considered to be something that impacted the main aims of the project, so it was not followed.It is worth noting, however, that changes in effluent dilution would tend to affect many substances in the same way (greater dilution = equally lower effluent concentration).
3.2.Direct comparison between the same set of 75 effluents that were investigated in both CIP1 and CIP2 programmes Although the scale and aims of the two phases of the CIP programme were different, for reasons of continuity and as a check on developing changes, 75 WwTWs sites were monitored in both CIP1 and CIP2.This offers the interesting possibility of a direct comparison of data for the same sites monitored in essentially the same way on two separate occasions separated by between four and seven years.As noted earlier the CIP2 programme was undertaken for over 600 WwTW sites, which meant that it had to be split into four phases or "tranches" in the years 2016 to 2019.In the first part of the assessment below comparisons have been made of individual differences between mean concentration values at each of the "duplicated" sites regardless of CIP2 tranche in which the duplicate analysis was undertaken.In the second part a graphical approach has been used for certain determinands of interest by illustrating on the basis of which tranche the second set of analysis was carried out.In other words, CIP2 tranche 1 comparisons related to a time period between 2010/11 and 2016, where CIP2 tranche 4 comparisons are for 2010/11 to 2019.It is important to note for clarity that for any given site in the 75 WwTWs of interest only one pair of data sets was generated (i.e., there was no duplication within CIP2 from one tranche to another).
Table 3 summarises the observed differences between mean effluent concentrations determined in CIP1 and CIP2.The differences have been tested for statistical significance in using two methods.Firstly, a paired Student's "t" test based was carried out having log transformed the data in order to account for instances of non-Normality.
Secondly, as a comparison, a non-parametric test (Mann -Whitney U test) based on ranking of the data was also undertaken.The Mann-Whitney U test is less likely than the t-test to indicate spurious significance because of the presence of outliers.Overall, allowing for this, there appears to be reasonable agreement between the verdicts on statistical significance delivered by the two different statistical assessments.
In Table 3 positive changes (CIP2 data higher than CIP1 data) are shown as unshaded.Negative changes, net decreases, are shaded.The probability "p" value reported is the calculated probability associated with the Null Hypothesis that there is in fact no difference between the CIP1 mean and that observed in CIP2 and that the observed difference might have occurred by chance.Hence low values of "p" tend to indicate that the Null Hypothesis should be rejected.Instances where a real change is implied (at the 0.05 probability level) are shown in bold.Values of "p" are rounded to two decimal places.
The required Limit of Detection (LOD) is defined as 4.65 x sw (ISO/TS 13530:2009, 2012).Here sw is the within batch standard deviation of measurements made in an appropriate matrix containing essentially no determinand.Values reported are median effluent concentration values expressed in μg/l, apart from italicized determinands which are as mg/l.Each value relates to a single WwTW effluent.Shaded values correspond to determinands for which data were reported with a proportion of less than LOD result greater than 40%.As such it is considered that lower percentiles values are relatively less reliable than for the remaining data (see text).
Two graphical examples of the comparison of CIP1 and CIP2 data for "duplicated" sites is shown in Figs. 1 and 2 for dissolved nickel and for diethylhexylphthalate (DEHP).Further illustrations of this form are included the attached electronic supplementary information (ESI, S2).
The visual impression provided by the figures above (the extent to which red CIP2 marker are lower than blue CIP1 markers), suggests that dissolved nickel is on a continuing downward trend over time, whereas  less of and similar trend is obvious DEHP.Given that there are no specific restrictions on the use of nickel, ongoing iron dosing at WwTW for phosphate removal is likely to have been the reason for observed reductions in nickel (Comber et al., 2021).Whereas in the case of DEHP the substantial reduction in measured concentrations (between the blue CIP1 data and the red CIP2 data, in all tranches) seems to have occurred primarily between and 2016, reflecting increasing restrictions in its use over the past decade.

Discussion
The estimates of change in the concentrations of substances listed in Table 3 fall into three principal categories.Firstly, and most importantly, are the determinands for which the changes in concentration both statistically significant and of a magnitude that can be considered to be of practical importance.The reality of these changes is also made more credible because it is in many cases that control measures have been put in place in order to limit the use of and/or the release of these stances into wastewater.Furthermore, between 2010 and 2015 and 2015 to 2020 the water industry invested £4.6billion £2.3billion respectively on infrastructure improvements (for tap water wastewater assets, Ofwat, 2021;Stevens, 2011) including reducing phosphorus loads to receiving waters, where iron dosing has been shown to reduce concentrations of other priority chemicals within effluents (Comber et al., 2021).This list includes nickel, the potential xenoestrogens DEHP and nonylphenol, tributyltin, the brominated diphenyl ethers and triclosan, all of which are observed to have been reduced in concentration by 30-50%.Fig. 3 shows estimation plots (Gardner and Altman, 1986) illustrating the individual mean concentrations for these determinands and a corresponding overall mean difference with an associated 90% confidence interval.
The concentrations of two of the three pharmaceuticals for markedly more limited sets of comparison data are available (8-9 instances compared with generally more than 70 for the other substances) are also subject to statistically significant change, an increase of over 100% for fluoxetine and a decrease of 50% for the antibiotic erythromycin (Mann Whitney p values 0.001 and 0.002, respectively) which may reflect changes in prescribing patterns over time.Fig. 4 illustrates changes in concentration, in the form of slope graphs (Tufte, 1983) in which each line links each mean concentration from CIP1 to its corresponding value in CIP2.These provide a visualization of the "before and after" concentrations showing: the Note: 1 The maximum number of direct comparisons between CIP1 and CIP2 monitoring data is 75.For various reasons not all determinands were monitored at all sites at all times.
variation between sites, the general trend identified, any of differences and of sites that go against The second category is that of determinands where any observed changes are demonstrated in the graphs presented in S2 of the ESI as arising from changes (improvement) in reporting limits achieved between CIP1 and CIP2.These substances include cadmium, mercury and the polycyclic aromatic hydrocarbons benzo(a)pyrene and fluoranthene.
Finally, there are the substances for which any observed changes are too small to be important or are otherwise largely irrelevant to the wider issues of pollution control: these include copper, zinc and various sanitary  determinands.With respect to his last group, it is noteworthy that the majority of changes are negative suggesting sewage dilution as a possible cause.Examination of quarterly rainfall data for the UK (Fig. 5, statista. com, Statista.com, 2021) does not provide convincing evidence for major differences in potential dilution of wastewater, though the variability of rainfall data between the two period of interest was markedly more variable during CIP1 than in CIP2.
The European Environment Agency has recently undertaken an exercise to attempt to quantify loads of priority chemicals from EU WwTW.This has entailed collating available WwTW effluent data for all WFD listed chemicals (Deltares, 2021).The substances, dates and number of WwTW sampled is provided in S3 of the ESI.The EEA dataset highlights the lack of data for WwTW effluents, with the majority of the data being generated in north and western countries, dominated by the UK, France, Germany, Belgium, Finland and Denmark.More limited data are available for more eastern and southern European countries.The table also highlights the importance of this UK dataset in that based on number of WwTW studies, the CIP data comprises between 19 and 99% of the available data reported over the last 10 years.
Compared with some of the larger EU studies, the CIP reported concentrations are comparable in values in most cases, though with some variance in others (Table 4).For total nickel, there is reasonable agreement across the various monitoring programmes reflecting the ubiquitous, but relatively low concentrations observed in wastewater.Total lead concentrations reported elsewhere largely fall between the means observed between the CIP1 and CIP2 datasets.Total cadmium and mercury concentrations vary more significantly, up to two orders of magnitude in some cases potentially reflecting a combination of control measures and in particular in analytical performance relating to limits of detection and the proportion of less than values reported.This issue is also observed for a number of the organic compounds (e.g.cypermethrin, the alkylphenols, HCBDD) but to a lesser extent for the PAHs, PFOS and tributyltin which may be a result of blanket controls and ubiquitous, low level background occurrence.The data for cypermethrin highlight other issues associated with skewness of data distributions, method for dealing with non-detects and the prevalence of anomalous values (factor that all might affect other data, but perhaps not to such a marked extent.)Table 4 also illustrates the patchiness in the European datasets, not only are there large gaps in the geographical data across Europe, but even within different monitoring programmes the determinands reported vary considerably.This reflects the different objectives of individual monitoring programmes, for example the CIPs did not include solvents or POPs that were banned some time ago as well as a limited number of pesticides potentially used in a domestic setting.However, it covered metals of interest to the UK as well as chemicals of interest such as pharmaceuticals, EDTA, bisphenol and triazoles (not all of which are reported here because they are currently of minor regulatory importance).Other studies throughout Europe in many cases have used a suite of analysis which includes most if not all of the WFD chemicals (at least from the first list) but have been selective in choosing other substances possibly owing to variations in current concerns and hence in available analytical methodologies.

Conclusions
Ultimately, the question of a wastewater treatment works' effluent quality centres on the issue of its impact on the compliance of receiving waters with environmental quality standards.The substantial dataset generated by the CIP has provided important insights into the extent and principal priorities of such impacts.This paper provides indications of the progress in the improvement in the quality of wastewater that have been achieved over the latest decade.Knowledge of the quality improvements that are evident in the cases of nickel, DEHP, nonylphenol, tributyltin, the brominated diphenyl ethers and triclosan is likely to be of benefit in directing future pollution control policy.Conversely, the lack of appreciable changes for other substances might also be worthy of further strategic consideration.

CRediT authorship contribution statement
Michael Gardner -Conceptualization, Methodology, Data Curation, Formal analysis, Writing -Original Draft.
Brian Ellor -Management, data quality, communication, review and editing.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Science of the Total Environment 832 (2022) 155041 ⁎ Corresponding author.E-mail address: sean.comber@plymouth.ac.uk (S.D.W. Comber).http://dx.doi.org/10.1016/j.scitotenv.2022.155041Received 27 September 2021; Received in revised form 31 March 2022; Accepted 31 March 2022 Available online 04 April 2022 Contents lists available at ScienceDirect Science of the Total Environmentj o u r n a l h o m e p a g e : w w w .e l s e v i e r .c o m / l o c a t e / s c i t o t e n v

Fig. 1 .
Fig. 1.Comparison of data for duplicated WwTW sites for CIP1 and CIP2 for dissolved nickel.

2.
Comparison of data for duplicated WwTW sites for CIP1 and CIP2 for DEHP.

Fig. 3 .
Fig. 3. Determinands showing credible change Notes: Dotplots on the left show the mean individual site concentrations for CIP1 and CIP2."y" axis baseline values arise where both CIP1 and CIP2 data are reported as < LOD (triclosan and BDE 47).Confidence interval (based on a paired t-test for the individual duplicated sites) for the difference in mean values on right shows overall mean difference and 90% confidence interval (non-overlap with zero on the right-hand scale indicates statistical significance at p = 0.05).

Table 3
Summary of variation between paired CIP1 and CIP2 datasets (μg/l unless stated).

Table 4
Mean concentrations of selected priority chemicals in European WWTW effluents.