Understanding and managing uncertainty and variability for wastewater monitoring beyond the pandemic: Lessons learned from the United Kingdom national COVID-19 surveillance programmes

The COVID-19 pandemic has put unprecedented pressure on public health resources around the world. From adversity, opportunities have arisen to measure the state and dynamics of human disease at a scale not seen before. In the United Kingdom, the evidence that wastewater could be used to monitor the SARS-CoV-2 virus prompted the development of National wastewater surveillance programmes. The scale and pace of this work has proven to be unique in monitoring of virus dynamics at a national level, demonstrating the importance of wastewater-based epidemiology (WBE) for public health protection. Beyond COVID-19, it can provide additional value for monitoring and informing on a range of biological and chemical markers of human health. A discussion of measurement uncertainty associated with surveillance of wastewater, focusing on lessons-learned from the UK programmes monitoring COVID-19 is presented, showing that sources of uncertainty impacting measurement quality and interpretation of data for public health decision-making, are varied and complex. While some factors remain poorly understood, we present approaches taken by the UK programmes to manage and mitigate the more tractable sources of uncertainty. This work provides a platform to integrate uncertainty management into WBE activities as part of global One Health initiatives beyond the pandemic.


Environmental surveillance for public health
The acquisition of data and extraction of information from environmental samples to manage and improve public health has been a cornerstone of societal development for nearly 200 years (Choi, 2012). However, in comparison with medical and pharmaceutical innovation, much of the work in the field of environmental public health is largely unrecognised outside of the professional communities invested in its use. This is evident with wastewater, a conduit for an array of bio-and chemical markers that can be analysed to provide information on human activities, behaviours, and health status in populations (Castiglioni et al., 2013;Daughton, 2018;O'Brien, 2017;Subedi, 2019), but which has remained a relatively untapped resource given its known potential (González-Mariño et al., 2020;Kasprzyk-Hordern et al., 2021;Lorenzo and Picó, 2019;Pruden, 2014;Singer et al., 2013). However, the negative perception of wastewater as solely a polluting substance, to be removed (from the human and natural environments) and cleaned (often by energy intensive processes), has undergone re-evaluation in recent years. The focus on sewage as a resource rather than a waste product is driving innovation in the water industry. Accordingly, the diversity of biotic and abiotic features within the sewage matrix presents an opportunity to acquire actionable insights through routine monitoring and analysis of its components. An increasing technological and computational capacity for deriving knowledge from measurements and data has manifested in efforts to 'smarten' the water industry (Wade et al., 2020a), fusing data science with fundamental science and engineering principles. This provides opportunities for greater utilisation of sewage for the common good, be it in the production of resources such as energy and high value chemicals (Kehrein et al., 2020), or as a proxy of human health and behaviours, which will have transformative impacts for society.

Wastewater-based epidemiology in a time of crisis
The nature and extent of the COVID-19 pandemic has driven an unprecedented response from a diverse array of stakeholders, internationally. The efforts to tackle both the spread of the disease and its impact on populations have highlighted the need for disparate communities of scientists, government agencies, decision makers and the public to work together and collectively address the multiplicity of public health, economic and social challenges that have emerged over the course of the pandemic (Kinsella et al., 2020;Lundy et al., 2021). This is also the case with the development of wastewater-based epidemiology (WBE) as an important tool to facilitate the detection and spatiotemporal monitoring of SARS-CoV-2 virus dynamics in the environment being undertaken in many countries (Bivins et al., 2020a;Naughton et al., 2021;Wade et al., 2020b).
Several studies have shown that the risk of infection by active SARS-CoV-2 virus in pre-or post-treated wastewater is low, particularly in modern sanitation systems (Giacobbo et al., 2021;Kumar et al., 2021;Saawarn and Hait, 2021;Tran et al., 2021). Nevertheless, inactive fragments of the virus RNA have been shown to persist longer in water than infectious virus (Bivins et al., 2020b) and are shed by an individual over the entire disease cycle (asymptomatic and symptomatic) (Vaselli et al., 2021;Zhang et al., 2021). Subsequently, most reports on SARS-CoV-2 detection and quantification in wastewater have focused on monitoring of the inactive virus, more specifically, the targeting of small regions of the virus genome using an array of analytical methods, such as reverse transcription-quantitative polymerase chain reaction (RT-qPCR) , genomic sequencing Crits-Christoph et al., 2021;Pérez-Cataluña et al., 2021), and, more recently, mass spectrometry (Lara-Jacobo et al., 2021), to detect and identify the emergence and spread of novel variants in the population.

Beyond COVID-19
Although the COVID-19 pandemic has highlighted the benefits of exsitu monitoring human-associated disease in the environment, WBE has also been successfully applied in other public health contexts, such as tracking pharmaceuticals, such as self-prescribed drug usage in cities (Baz-Lomba et al., 2016;Zuccato et al., 2008), antimicrobial resistance (Hendriksen et al., 2019), and assessment of human exposure to environmental pollution (Gracia-Lor et al., 2018;Kasprzyk-Hordern et al., 2021;Singer et al., 2013). The severity of the current pandemic is a strong motivation for increased and integrated public health and environmental surveillance at national and supra-national scales (Carroll et al., 2021;The European Commission, 2021). Whether it is future-proofing for potential new pandemics (Daszak et al., 2020) or water fingerprinting to determine factors impacting both physical and mental health in communities (Sims and Kasprzyk-Hordern, 2020), wastewater surveillance will become a vital tool at the disposal of governments and public health authorities at the nexus of public and environmental health beyond COVID-19.

Wastewater and public health, an uncertain relationship
The manuscript is focused on the understanding and management of uncertainty in WBE, framed by, but not limited to, lessons-learned from wastewater surveillance during the COVID-19 pandemic. For broader discussion of WBE and its implementation as a tool for informing decision-making and policy, there are a plethora of excellent review articles that may be referred to (Bivins et al., 2020a;Farkas et al., 2020;Polo et al., 2020). The data rich, technologically diverse and computationally powerful resources available for WBE present an opportunity to deliver next-generation public health solutions in combination with targeted or passive environmental monitoring (The Lancet Public Health, 2019). Deriving an understanding of sources of uncertainty and implementing methods to estimate and account for measurement error is therefore critical for the design and implementation of wastewater surveillance to support public health decision-making. We posit that the insights presented here have wider consequences for WBE efforts beyond the pandemic.
Here insights are shared from the United Kingdom (UK) WBE surveillance programmes (See Section A for details), and the collective knowledge that has helped support public health initiatives during the COVID-19 pandemic and beyond. The proceeding sections discuss uncertainty and variability derived from source (population, shedding), innetwork (i.e. the pipe network acting as a wastewater collection system) characteristics, and sampling and sample analysis. We conclude with four case-studies related to distinct aspects of applied WBE, providing examples of how uncertainty and measurement variability are addressed and managed. samples for epidemiology include a response to the determinant of focus (e.g. disease) that captures close to all of the population contributing to the signal and the provision of near real-time insights obtained from changes in the magnitude or direction of this response (Mao et al., 2020). With respect to COVID-19, clinical sampling of individuals in the community, via nasopharyngeal swabs, saliva or serological tests, is subject to biases associated with factors common to sub-sampling of heterogeneous populations (Hilborne et al., 2020), and mass-testing to obtain a representative sample size is costly. Wastewater, on the other hand, provides an aggregated picture of community disease state through the measurement of virus RNA excreted by, theoretically, all viable shedders with the disease in the sewer catchment (Hoffmann and Alsing, 2021), and can be implemented at a relatively low-cost in comparison with clinical sampling (Hart and Halden, 2020). In reality, however, the accuracy and representativeness of any measurement acquired from wastewater is subject to a number of influencing factors, which can be classed as observable (e.g. sample dilution by exogenous hydrological flows), or partially observable (e.g. in-network analyte decay/degradation). Two recent reviews of wastewater-based SAR-S-CoV-2 detection have focused on factors contributing to uncertainty in disease prevalence estimation  and, more specifically, errors associated with laboratory quantification using RT-qPCR .
It is well understood that environmental measurements are subject to extraneous factors that account for differing degrees of (measureable) variability and uncertainty (unexplained or unmeasureable variability) in the signal (see (Anon, 2009), for example), and surveillance for WBE is particularly impacted by the complexity of the media being sampled (Kantor et al., 2021a;Li et al., 2021;Sims and Kasprzyk-Hordern, 2020). Fig. 1 presents an overview of the known and potential sources of uncertainty in WBE for COVID-19, grouped into spatiotemporal classes (i. e., where and when the uncertainty is likely to impact the measurement). For COVID-19, variability manifests as a significant problem when different measured virus RNA concentrations are observed for, theoretically, the same proportion of infected individuals in the population. More precisely, it is the uncertainty arising between the target analyte (RNA) and its representation of the measure of concern, e.g. disease prevalence (number of individuals with the disease at any given time) or incidence (number of newly identified individuals with the disease for a particular time period). Unwanted variability can occur over time at a given sample site due, for example, to rainfall or snow melt entering into a combined sewer network during or after wet weather events, and diluting the analyte concentrations relative to a dry weather baseline. With target analytes such as virus particles, which can attach to solids in the network, the impact of increased flow in the sewer is likely non-linear due to the effects of turbulence and scouring on settled solids resuspension. Although, to our knowledge, no evidence of this currently exists for SARS-CoV-2.
Variation between sites is also a problem when using WBE measurements for comparison across geographies, or when aggregating to provide supra-catchment perspectives of target analyte dynamics. For example, a large catchment having a long hydraulic residence time may systematically produce lower concentration measurements than a smaller site, even though the disease prevalence could be the same in both catchments. As shown in Fig. 1, factors causing uncertainty or unwanted variability can range from large-scale processes, such as highly transient populations, to those at a smaller scale including laboratory specific methods . In each case, strategies are needed to account for the variability in a way that is appropriate for the intended use of the data.
Uncertainty imposes a lower level of confidence on a measurement than accountable and manageable (signal) variation (Lehmann and Rillig, 2014). Evaluation of uncertainty is necessary for WBE as quantifying the error bounds (and understanding the limits) of sample measurements is critical for capturing the inherent risk associated with public health decision-making processes. These risks are similar for likely all applications of WBE, i.e. incorrect estimation of target analyte (s); inability to compare measurements from different environments or under different conditions; loss of confidence in ability to detect or quantify the target analyte(s). The risks associated with uncertainty to wastewater surveillance of COVID-19 are wide-ranging and depend on its use-case. For example, recent attention focused on how using measurements from wastewater in epidemiological models (Fuschi et al., 2021;McMahan et al., 2020) could increase parametric uncertainty and error bounds on model estimates (Edeling et al., 2021). This, in turn, will affect the suitability of the model for tracking and predicting the dynamics of the disease (Saththasivam et al., 2021). Using raw wastewater measurements without accounting for factors that can affect interpretation, such as wastewater dilution or signal decay, may have a significant impact for decision-making when used to complement other sources of disease prevalence.

Population factors
Knowledge of the contributing population size upstream of the sampling location is important for calculation of per capita concentrations and to facilitate comparison between sample sites. It is important to have an accurate estimate of population size, (a) to ensure that intersite comparisons are made on an equivalent basis, and (b) to account for the effects of intra-site population change on the loads of measured target(s) in the wastewater. Population size is, however, uncertain and variable. The mean population size may be estimated based on census data and additional demographic statistics, but such estimates cannot be easily updated to account for changes resulting from births, deaths and migration, and can quickly become outdated (Daughton, 2012). Fluctuations in population during the sampling period can contribute further uncertainty. These include, for example, weekly and seasonal variations due to the flux of commuters and tourism or student populations, respectively. Dynamic population estimates may be obtained using water quality parameters such as ammonia and orthophosphate; however, this is subject to bias due to the contribution of additional sources such as industrial discharges (Béen et al., 2014). The use of mobile device data , chemical biomarkers present in urine (e. g. caffeine, pharmaceuticals) (Rico et al., 2017) or human-specific microbial/molecular markers (e.g. crAssphage, human adenovirus (HAdV), JC polyomavirus (JCPyV)) (Rusiñol et al., 2021;Sala-Comorera et al., 2021), are alternative metrics that have been shown to reduce measurement uncertainty when used to estimate population size for normalisation of target analyte concentrations.
To illustrate the potential effects of population variability, Fig. 2 shows the impact of reporting per capita SARS-CoV-2 loads instead of SARS-CoV-2 concentrations on trends identified at a STW site in England. In this case, a site-specific mean daily ammonia discharge per capita (〈x〉) is estimated using Eq. (1), where 〈 ⋅ 〉 indicates the expected value. The estimation is based on daily measured ammonia concentrations (X d ) and wastewater flow rates (Q d ) for the entire sampling period and the Office for National Statistics population estimate (P) for the catchment. SARS-CoV-2 gene copies per capita per day (L d ) are then calculated on a daily basis using Eq. (2), based on this value and the measured SARS-CoV-2 (S d ) and ammonia concentrations for the current day. Ammonia concentrations and per capita loads are selected as the basis for population normalisation as (i) flow-rate data is typically not available or at a lower cadence than our sampling frequency; and (ii) flow rate is not proportional to population due to variation in dilution.
Error bars are included in Fig. 2 to indicate standard deviation (σ) resulting from variability in the site-specific ammonia nitrogen discharge per capita (i.e. S d σ(x)∕X d , where σ(x) = σ(X d Q d )∕P); these do not capture any other sources of uncertainty.
An important and poorly understood source of uncertainty related to proportion of contributing population is the quantity and rate of analyte released into the network through faecal or urinary shedding. Faecal shedding of SARS-CoV-2 RNA varies both between individuals and over the infection course of any given individual (Hoffmann and Alsing, 2021). Indeed, a recent study has indicated, from near-source data, that faecal shedding peaks on average 6 days post-infection (95% Uncertainty Interval 4 -8 days) (Cavany et al., 2021). The impact of shedding variability between individuals is attenuated for large catchments and during high prevalence periods because the sewerage system naturally averages the signal from many people (Jones et al., 2014). Due to the greater variability in the wastewater measurements compared with clinical data sources, the power of WBE surveillance, at least for COVID-19, is expected to be greatest when transmission (and prevalence) or clinical testing is low; i.e. capturing (re)emergence of disease in a community. However, quantitative estimates of the number of individuals infected are likely to remain elusive when infection prevalence is low or the sampled population is small, such as for near-source sampling, where samples are taken upstream in the sewer network close to the discharge source (e.g. outside a building). In the latter case, probability of capturing a representative sample is low as contributing events (e.g. toilet flushes) are more discrete and non-aggregated, such that grab sampling risks missing the event, while composite samples may be heavily diluted by analyte absent wastewater. Temporal variability of viral RNA shedding over the infection course implies that the concentration of SARS-CoV-2 gene copies in wastewater is a convolution of disease incidence with the shedding profile (Wu et al., 2020). Consequently, techniques for relating epidemiological indicators to wastewater-based signals need to consider multiple time lags, for example by employing distributed lag models . Studies to investigate viral shedding prior to symptom onset are urgently required because existing data have been collected from hospitalised patients Kantor et al., 2021b;Miura et al., 2021). Similarly, the impact of vaccination on faecal shedding of viral RNA is unknown, although data from nasal swabs suggest that viral loads are likely to be reduced . Given this, quantifying virus at near-source with any precision remains elusive, and further work to understand faecal shedding distribution is critical for adoption of wastewater measurements in epidemiological models for estimating transmission rates (i.e. effective reproduction number, R eff ) (Huisman et al., 2021). This information can be applied broadly to other analytes routinely shed in the urine and faeces that correlate to public health indicators, although shedding profiles could be markedly different from those for viruses.

In-network characteristics
Characteristics of the sewage network (proportion of gravity or pressurised pipes; size of the network; retention capacity; location and triggering of combined sewer overflows (CSOs), and use of sustainable urban drainage infrastructure to separate stormwater flow in the catchment) may impact both the quantity of analytes of interest within the water and their distribution within the sewage volumes.
The daily flow patterns in most wastewater systems are oscillatory, driven by multiple factors such as sewer network design (e.g. combined or separate systems), industrial discharge events and prevailing weather conditions. However, the flow signal, under dry weather flow conditions, is governed by household water usage, which often presents as morning and evening 'peak flow' pulses, especially in small catchments. These daily oscillations are damped in catchments with a wide network or large storage capacity where peak flow can be retained and processed later, leading to a homogenisation of the signal (Ort et al., 2010). Pumps across the network or at the inlet of STWs can also homogenise analyte concentrations within the flow, with sumps or wet wells acting as small retention tanks. Ingress of non-human derived flow, e.g. from rainfall or snow melt, in combined sewers, or groundwater infiltration in all sewers, can bias measurements by signal dilution.
Sewer network size, sewer gradient, pipe friction, and presence of retention tanks can impact the time-of-travel of wastewater 'packets' (typically < 1-24-h in the UK, dependent on catchment size), and may reduce target concentrations that are prone to degradation (Ahmed et al., 2020a) (i.e. those with a short T90, the time for one order of magnitude reduction in concentration). Moreover, the type of sewage system (gravity or pressurised pipes) can directly impact the decay rate of analytes of interest due to differences in biofilm composition within these two environments (Banks et al., 2018) (fully anaerobic for pressurised pipes and mixed anaerobic/aerobic in gravity sewers). Further, the shear stress created by cycling between pressurised and unpressurised pipes might further hasten the decay of labile analytes. Finally, retention tanks may also increase the binding of hydrophobic targets, such as SARS-CoV-2 virions, with suspended solids to form complex matrices (Balboa et al., 2021), which may obfuscate their subsequent detection by laboratory analysis, or result in settling-resuspension phenomena in the sewer pipes (Solvi, 2007), decoupling the temporal dynamics of the virus RNA from the discharge event. Significant sewer pipe leakages may also influence the fate of virus, and consequently its downstream detection and quantification, especially in older networks.
Adjusting for the impact of network characteristics across a national programme is challenging due to the need for quantitative, comparable information for individual site networks. In England, this data is typically owned by private water utilities and, in many cases, the precise configuration of the network is not known, unless access is provided by the companies. However, the impact of some site characteristics can be mitigated by taking into account co-dependent, measurable parameters. For example, ammonia concentration (Béen et al., 2014), Pepper Mild Mottle Virus (PMMoV) (Wu et al., 2020), or crAssphage (Wilder et al., 2021) can be used as a proxy for the dilution effects in combined sewers, and catchment area is a rough approximation for network size. While proxy variables are useful in the absence of true measurements, their use in management of target measurement uncertainty may be limited by how representative they are of the analyte of interest. The use of multi-biomarkers to better represent human wastewater contribution (See Section 5.3), or GIS-based modelling and public health information to better characterise catchment population are currently employed methods to mitigate this limitation.

Sampling strategy
In the context of WBE and public health surveillance, acquiring a representative sample that captures the analyte of interest is fundamental to support actions that have the potential to impact the wellbeing of individuals and communities (Ort et al., 2010). The source of the analyte(s) targeted, through urine (e.g. metabolites of pharmaceuticals) or faeces (e.g. viruses), can impose additional variability in measurements, and beyond the COVID-19 pandemic, wastewater surveillance programmes will need to build in sampling flexibility and rigour to account for this uncertainty (Rose et al., 2015).
Broadly, there are two ways to take a sample: (i) a 'grab' or 'spot' sample where a single sample of wastewater is taken using a small container, and (ii) a 'composite' sample where samples are taken regularly throughout the day using an automated device (autosampler) and the samples mixed together in a single container. Several autosampling modes may be used to create a composite sample: time-proportional, where a constant sample volume is taken at regular time intervals; flowproportional, where the time interval is kept constant but the sample volume is proportional to the instantaneous flow rate in the sewer; and volume-proportional sampling, where a constant volume sample is taken each time a fixed volume passes through the sewer (Ort et al., 2010). Measurement uncertainties are heavily impacted by the type, mode and timing of sampling depending on variability of flow and analyte concentration over time (Ort et al., 2010). The difference in the probability of detection between sampling methods becomes greater as prevalence (of the target analyte) decreases. Specifically, when concentration is low, detection likelihood via grab sampling would be much lower than with composites, while as concentration increases, the probability of detection using grabs becomes comparable. This would suggest that composite sampling is preferable during periods of low target analyte concentration. However, if the daily signal is concentrated in time, well-timed grab samples could capture higher concentrations than is possible with composite samples (as shown in Fig. 3). The nature of composite samples means that it dilutes a 'sharp' signal, which can be a disadvantage at low prevalence times. Areas with a more temporally constant signal would be less sensitive to the choice of sampling method. This risk can be mitigated somewhat by the appropriate design and use of autosamplers. However, the autosampling method can also impact measurement confidence. Time-proportional sampling can lead to under-or over-weighting of sample during periods of high or low flow, respectively, resulting in loss of representativeness. A volume-proportional sampler extracts a fixed volume of sewage when a predetermined volume of flow has accumulated. The resulting daily sample will be weighted by flow and could be argued to be more representative of the conditions of that day, assuming that the substance of interest is distributed uniformly through the day. However, on low flow days the sample volume may be too low for effective analysis, while on wet days the full volume may have been taken long before the end of the sampling period.
As samples are not always collected daily, sampling cadence must be considered when determining WBE sampling strategies. Aliasing effects may result in incorrect interpretation of signal dynamics, or produce artefacts in models used for back-calculation of target stressors (i.e. biological, chemical and physical agents used in environmental science and exposure-effect analyses as determinands impacting humans and ecosystems, such as SARS-CoV-2 in WBE), for example (Chappell et al., 2017). A sampling frequency as close to the daily cadence will reduce uncertainty arising from temporal variability. This has been quantified through a data ablation experiment for 186 network sites monitoring SARS-CoV-2 in England, for which a number of samples were artificially removed to compute the relative bias introduced by reducing the sampling cadence, as shown in Fig. B.4. The percent error in the mean concentrations are thus computed for each site and then averaged to compute the per-site mean percent bias shown. Consideration of sampling frequency in relation to sample location in the network is necessary. In small catchments, or near-source applications (e.g. monitoring of critical infrastructure such as prisons, care homes and schools), high-rate composite sampling may not be enough when all discharge events should be captured. Technologies that can provide continuous active/powered sampling, which are pumped on a timed or triggered basis, or passive samplers, which collect water/solids without power and are typically lower cost, are more suitable in this context Coes et al., 2014).
Variability due to differences within and between site sampling deployment have a potential to be a significant influence on the measurement, particularly when establishing a national surveillance system with a large number of sites and different site personnel involved. Detailed and clear sampling protocols and ongoing training of staff are essential to minimise some of the sources of this variation.

Sample analysis
Wastewater is a highly complex and variable media, containing compounds that can decrease detection sensitivity, which results in false-negative results, whilst also compromising the ability to quantify the analyte of interest, such as genetic fragments, accurately. As significant knowledge performing sample analysis has been gained while monitoring the COVID-19 pandemic, insights relating to the uncertainty arising from SARS-CoV-2 quantification have been addressed, leading to a consolidated application of wastewater lab-analysis for WBE.
Due to the low concentrations of SARS-CoV-2 in wastewater, methods are required to pre-concentrate the virus prior to analysis. The most commonly used methods include precipitation with salt or polyethene glycol (PEG (Farkas et al., 2021)), electrostatically charged membrane filtration (Ahmed et al., 2020b), ultrafiltration (Izquierdo--Lara et al., 2021), or adsorption-precipitation with aluminium chloride or silica (Randazzo et al., 2020). Due to the expense, poor availability and potential for blockages with ultrafiltration devices, the English wastewater surveillance programme initially adopted the PEG precipitation method. This was based on previous success at recovering viruses from wastewater (Farkas et al., 2018) and also that it does not require an extra step for pH measurement and correction. However, the overnight precipitation step in the method increased the time from sample collection to reporting. A decision was then made to switch from PEG to salt (ammonium sulphate, AS) precipitation as the latter only requires a 1-h incubation step. Parallel studies with duplicate wastewater samples showed no significant differences in recovery between the two methods for SARS-CoV-2 RNA (data not shown). This AS workflow now allows viral RNA to be concentrated, extracted and quantified within a 24-h window.
Another key step in SARS-CoV-2 determination from wastewater is to produce RNA extracts that ensure consistency in the quantity, quality, and purity of extracted nucleic acids for their applicability in downstream processes (e.g. detection, quantification, sequencing). SARS-CoV-2 determination is generally carried out with a nucleic acid-based PCR assay. However, given the wide-range of PCR inhibitors in wastewater and the options available for handling them, no single method serves all applications; a multifaceted approach being the best solution to avoid amplification failure. Therefore, efficient extraction methods are required to purify inhibitor-free RNA, together with the use of inhibitor-tolerant quantitative reverse transcription PCR (RT-qPCR) mixes containing enhancers/additives to help reduce inhibition (e.g. gp32 and BSA). On the other hand, the low levels of SARS-CoV-2 in wastewater means that sample dilution to alleviate inhibition is not recommended or should be limited . Alternatively, the samples can be analysed by one-step digital-PCR (dPCR) rather than RT-qPCR. To estimate the efficiency of viral RNA recovery all samples in the English programme are spiked with phi6 phage as a process control, which is added to the samples at the beginning of the sample concentration process or after the initial centrifugation step, aiming to eliminate solid matter. The concentration step, in addition to RT-qPCR, have been identified as subject to greatest variability through the analysis workflow  Typically, the recovery of phi6 ranges from 1% to 50%, indicating that the viral recovery methods still need to be optimised for some wastewater types. This is supported by studies from England where wastewater has been spiked with heat-treated SARS-CoV-2 and where recovery is often incomplete (ca. 30-50% recovery; Kevill et al., 2021, unpublished 1 ). Alongside SARS-CoV-2, a range of other faecal-marker viruses (e.g. crAssphage, pepper mild mottle virus) have been measured in wastewater . In the English programme, crAssphage was initially used to help normalise the SARS-CoV-2 results to account for dilution by industrial wastewater and rainfall, however, this created extra workload and delayed the workflow, and was subsequently dropped in favour of other indicators of faecal load (e.g. ammonia). Section 5.2 (Case Study 2) presents some specific results from the management of laboratory analysis uncertainty and variability across the UK wastewater surveillance programmes.

Management of variability and mitigation of uncertainty
With the likelihood that the use of WBE becomes a key tool for public health agencies, providing data on a range of human health indicators towards One Health initiatives and global health security (Sims and Kasprzyk-Hordern, 2020;Johnson, 2021), more generally, then the acuity and timeliness of the generated information becomes critical to its success. The integration of both chemical-and biological-based WBE will be necessary to ensure that its function is both versatile and resilient in the face of growing demand and extraneous factors such as climate change, population movement, and infrastructure ageing, which will result in temporal and spatial variation in analyte profiles (Sims and Kasprzyk-Hordern, 2020;Mills et al., 2020); not only on measured levels, but on the ability to detect and measure.

Population normalisation and measurement correction
Measurement correction is key to addressing variation resulting from sampling, sample transport and storage, as well as possible errors linked with sample processing (including sample preparation: biomarker extraction from wastewater, concentration, and analysis). Normalisation of data is important to reduce uncertainties related to changing wastewater flows (resulting from diurnal changes and seasonal variability in rainfall patterns), movement of population, biomarker sources (e.g. intake vs. environmental occurrence) as well as biomarker stability and its transformation (e.g. human metabolism or metabolic degradation of sewer microorganisms). WBE in chemical exposure studies (e.g. illicit drugs, pesticides, industrial chemicals, pharmaceuticals) has been subject to comprehensive evaluation of uncertainties due to its application requiring a reliable quantitative measurement (e.g. per capita drug consumption). In chemistry-based WBE, 24-h composite sampling is strongly advised, as well as having labelled internal standards (analogues of biomarkers that do not exist in nature, e.g. benzoylecgonine D8, which is used as an internal standard to benzoylecgonine) used to compensate for errors occurring throughout sample storage, processing and analysis. Flow measurements of wastewater are required, as well as an understanding of stability of biomarkers in wastewater and their extraction efficiency/matrix effects (e.g. interfering chemicals during analysis). However, with biology-or pathogen-based WBE, grab sampling is still the norm in the UK, where there is an inherent lack of flow measurements in large national campaigns, as well as in near-source applications, which skew the results and make the studies more qualitative in nature. Biomarker selection in chemistry-based WBE requires pre-use validation, which includes the following requirements: (1) originating in human (with no other sources), (2) accounting for human metabolism, (3) stable in sewers, and (4) with excellent analytical performance in biomarker quantification (the latter is usually followed by inter-lab studies, or 'ring trials'). These factors are yet to be fully evaluated in biology-based WBE (and indeed in new chemistry WBE applications), where biomarkers are stressors themselves, and there is limited (albeit rapidly increasing) understanding of analytical method performance and stability of biomarkers. Most importantly, it is currently impossible to differentiate between different sources of stressor release to the sewerage systems.
Chemical analysis of certain biomarker groups, especially metabolites of high-usage, prescription only pharmaceuticals (e.g. antidepressants, antidiabetics, and antiepileptics) with well-defined consumption patterns, can provide important insights into diurnal changes in population size contributing to wastewater. Antidepressants are shown in Fig. B.8 as an example. Measurements were undertaken over seven consecutive days in five English towns/cities as discussed in Section 5.3 (Case Study 3). A significant positive relationship between the daily loads of antidepressants, their metabolites and the population size served by respective wastewater treatment plants was observed (Pearson coefficient, r ≥ 0.997, p < 0.0002). As expected, metabolites showed the lowest spatiotemporal variability in the studied intercity catchment (< 16% for desmethylvenlafaxine and < 12% for desmethylcitalopram), when compared to their respective parent antidepressants (venlafaxine and citalopram), which can be directly disposed-off into the drain. This indicates their suitability as population markers. Fig. B.8 also indicates the benefit of normalisation in trying to understand consumption patterns. In the figure, double normalisation was applied to account for variable flows and population. As a result, per capita change in consumption patterns can be observed and conclusions drawn regarding the variable consumption patterns in cities with different socioeconomic status. Further discussion on how certain variables affect back-calculations of chemical intake can be found in Case Study 3.

Design and implementation of sampling
The sampling strategies employed for SARS-CoV-2 surveillance across the UK have aimed primarily to address two key factors: percentage of the population covered and geographic representation, which includes both urban and rural area coverage. In addition, the sampling strategies have needed to allow for an agile sampling response to assist with surveillance of COVID-19 incidence clusters, as highlighted by governmental public health testing strategies. For sampling at STWs, these factors need to be facilitated by the regionally diverse privately and publicly owned sewage networks. Uncertainties arise in the actual population represented by the sampling strategies due to a mismatch between census administration geographies and population equivalents calculated for STWs. These may include estimates of the number of actual residents within a STW catchment, transient populations (i.e. those at workplaces, educational facilities, or communal gatherings such as sports or entertainment events), and the load placed upon each STW by industrial activity. Spatial data analysis approaches can be used to characterise the contribution of STW catchments to administrative geographies, which enable greater integration with public health case data (McKinley et al., 2021).
For sampling at STWs, the use of composite samples can help mitigate uncertainty associated with diurnal flow variations (see Fig. 3) but such samples may underestimate the magnitude of the peak concentration and, therefore, are more suited to understanding the average daily load within the sewer network catchment. Consideration of the ideal sampling site at each STW needs to account for the specific configuration of the inlet channels, equalisation storage, and mixing characteristics. In many cases, it is not possible to get a well-mixed sample with equal representation of all parts of the sewer catchment because of the design of the STW inlet piping. Several studies have sampled primary sewage sludge for SARS-CoV-2, with generally higher detection than from liquid influent samples (D'Aoust et al., 2021;Graham et al., 2021), although data on STW flows and process operation dynamics is required to fully characterise the period of time that each sample would represent. Sampling of solids has not been extensively performed in the UK. Mixed or combined sewerage systems (e.g. those receiving stormwater or industrial effluent) can also have a significant impact on sampling performance as dilution from additional flow and a more complex, or inhibitory, mix of wastewater constituents may obfuscate the ability to detect the signal (See Section 5.1 (Case Study 1) and Fig. B.5).
For in-network and near-source sampling, the large size and spatiotemporal complexity of urban water networks means that it is not economically or logistically feasible to collect a sufficient number of samples to ensure statistical significance of sampling results for estimating system wide average concentrations (Speight et al., 2004). Consideration of the diurnal variation of flows from both domestic and industrial sources and impact of rainfall can help to select sampling locations that are less vulnerable to influence by these factors. Well-calibrated hydraulic models of the sewer networks can be a useful tool to understand dry weather and wet weather dynamics. For example, Fig. B.6 illustrates the modelled dry weather contribution to wastewater flow for one of the core cities sampled by the surveillance programme in England, showing that some locations are dominated by infiltration flows with less than 40% of total daily flow derived from domestic wastewater. Many of the network sites initially sampled in the UK consistently showed non-detectable levels of SARS-CoV-2 and ammonia, consistent with the model results, and these locations were subsequently removed from the sampling programme.
Grab samples in sewer networks require precise timing to capture flows because many manholes are dry for large portions of the day, including near-source locations and upstream ends of the network. Many individual grab samples from network locations had nondetectable levels of SARS-CoV-2 in the core cities (see Fig. B.7). Sampling these locations daily, ideally with a slight variation in the time of sample collection, does not mitigate the underlying uncertainty associated with grab sampling but can assist with visualisation of trends and patterns despite the variability in individual sample results.

Fig. 5.
Comparison of carbamazepine daily loads, intake (calculated using both carbamazepine and carbamazepine-10,11-epoxide) and prescribed carbamazepine in five cities over a 7-day sampling week. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.).

Case study 1: flow normalisation
Several approaches have been developed within the UK wastewater surveillance programmes to account for rainfall dilution of SARS-CoV-2 RNA measurements in wastewater. Given that flow data is only partially available across all the monitored sites, indirect normalisation techniques using other biochemical markers as a proxy for the real flow can be used, with an assumption that the majority of markers originated from a source with a constant load. Note that these corrections account for the temporal variability, but are not sufficient to estimate the average flow level, which can vary significantly between sites (as shown in the population normalisation section). Different techniques, based on a similar premise, are presented below for flow variability correction in (a) Scotland and (b) England. Dilution effects appear to only have a minor impact on SARS-CoV-2 concentrations in wastewater, with significant changes occurring only during heavy rainfall events or discharges from other sources. That is, variation due to dilution effects with sample-by-sample variability from other sources of noise (e.g. faecal shedding), appears to be minimal. However, from an epidemiological perspective, highly diluted measurements caused by storm events, for example, may result in the need to correct values by factors as much as 0.6 (results not shown), which would significantly skew interpretation of disease prevalence if ignored when interpreting the data. A separate discussion on detection and management of measurement outliers and data anomalies is provided in Section B.5 of the Appendix.

Flow normalisation, as applied by the Scottish COVID-19 surveillance programme
In Scotland, in addition to detecting and quantifying SARS-CoV-2, chemical analytes, in particular ammonia, have been collected and processed from the wastewater. These are available up to 2 weeks prior to flow measurements -with flow, at some sites, not measured at all. As a result, a cross-site model is used to relate ammonia concentrations with flow measurements, taking into account population size as a proxy for faecal shedding in the catchments. A linear mixed model (LMM), with flow related to ammonia and population (on log10 scales), was developed and random intercepts and slopes were included for each site. This model was shown to fit the Scottish data better than a simpler linear regression model with the slopes for log10(ammonia concentration) and for log10(population) fixed at − 1 and + 1, respectively. Model performances were compared using Akaike Information Criterion (AIC) and a Kenward-Roger approximation of the Wald test for LMMs. A review of the Scottish data at each site, using a generalised additive model (GAM) with the Tweedie distribution, showed that unnormalised data was equally or more noisy than normalised (but scaled) data once trends were taken into account. A graph of example sites with fitted ammonia/flow curves is shown in Fig. B.11. Current practice in Scotland is to normalise by flow rate if available, then ammonia concentration. If neither are available, then an estimate of flow based on a spline function using recent ammonia trends is used (fitted on overall national trends over time plus site specific effects). If 'capping' is an issue, where CSOs prevent sewer overloading by discharging to natural water bodies, then normalising against ammonia would be preferential as a more representative measure of true flow. Anecdotally, it is not thought that capping is a major issue in Scottish wastewater networks, based on communication with water sector professionals.

Flow normalisation, as applied by the English COVID-19 surveillance programme
The approach assumes that the flow F t at time t is not directly observable. Therefore, information about the flow can be obtained by observing the correlation of concentrations ρ ti of different markers i and that a dilution estimate based on a single marker is not robust enough as it is not possible to distinguish between a decrease in flow and an increase in marker load, e.g. due to a one-off industrial or agricultural discharge. The model assumes.
where λ 2 is the flow variance, μ i and σ 2 i are the mean and variance of the load of marker i (all in log space). 〈logF t 〉 is fixed at 0 to identify the model. Accounting for variable dilution using multiple markers relies on the same basic premise as the approach presented for Scotland's case study, although with three key differences: first, using multiple markers (such as ammonia nitrogen and orthophosphate) jointly to estimate flow variability improves the accuracy of estimates. It also allows us to identify outliers (such as one-off discharges), and estimate flow variability as long as at least one marker is quantified (although with larger error bars). If no marker is quantified, the model predicts average flows with substantial error bars. Secondly, rather than assuming total marker loads are constant, they are assumed to be constant in expectation. In other words, natural variability of biomarker loads is accounted for. This allows to assign variable importance to different markers in a datadriven fashion. For example, crAssphage gene copy concentrations exhibit more natural variability than ammonia-nitrogen concentrations, and more importance should be assigned to the latteralthough both can inform our dilution estimates. Finally, a generative modelling approach is used to test hypotheses in silico, and any inferences in the form of posterior distributions over parameter values include principled estimates of uncertainty. The model also handles missing data gracefully and can incorporate limits of detection where appropriate (not further considered here). Unfortunately, the model needs to be fit whenever new data become available, and it is more computationally expensive than other methods. Any combination of two or more markers can be used to estimate flows using the multi-marker method provided that their total loads are constant in expectation. An example of the correction for an English STW is presented in Fig. 4.

Case study 2: uncertainty arising from laboratory analysis of SARS-CoV-2, and its mitigation within the UK wastewater surveillance programme
The analytical variability, in terms of both replicability and reproducibility, for the estimation of SARS-CoV-2 in wastewater has been a major focus of the UK wastewater surveillance programmes. In England, the use of two main laboratories (required due to the need for high throughput analysis capacity) provided significant challenges, but also opportunities to assess the reproducibility of sample analysis. Both laboratories employed the AS precipitation and, despite some differences in the use of RT-qPCR reagents and quantification standards, duplicate samples were analysed and found to be comparable (data Fig. B.2. Maps of the four regions of the United Kingdom showing the wastewater sampling locations for the respective national COVID-19 surveillance programmes (as of July 2021). Markers represent centroids of the catchments serving the sample point and shading is the 7-day average SARS-CoV-2 RNA concentration (gene copy per litre) measurements at each site over the last week of June 2021. This is only an example of the spatial distribution of sampling in the UK and comparisons of concentrations between sites should not be made from these figures due to differences in sampling frequency and network characteristics across locations. unpublished). In addition, an inter-laboratory ring trial was carried out involving five laboratories across the four nations, three using AS precipitation and two using filtration (Walker et al., 2021, unpublished 2 ). Significant differences were found in the absolute SARS-CoV-2 concentrations measured by all laboratories. However, these differences (less than one log between labs) were much lower than reported in other ring trials (Pecson et al., 2021). Further, the variability between the laboratories was similar to previous inter-laboratory trials for quantifying viruses (e.g. Norovirus, Hepatitis A) in shellfish (Lowther et al., 2019).
The differences in the SARS-CoV-2 recovery between laboratories is likely due in part to the differences in the initial virus concentration method (e.g. ultrafiltration versus AS precipitation) and the use of different RT-qPCR standards. The UK is now contributing to discussions on the development of an ISO standard for quantifying SARS-CoV-2 in wastewater. The development of an ISO standard will enable a greater degree of international collaboration and provide the basis for external proficiency testing schemes. The latter will give laboratories and accreditation services a means to assess laboratory performance and flag potential quality issues that require investigation.
The efficiency of downstream applications depends strongly on the purity of the RNA sample used. In this regard, the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines stipulate that a measurement of a nucleic acid quantity is essential, while an assessment of purity is desirable (Bustin et al., 2009). This is particularly important to avoid false negatives when SARS-CoV-2 concentrations are too low to be quantified after dilution, requiring the use of internal or external controls, such as RNA/DNA spikes, to detect inhibitors and verify several other parameters of the workflow (See Fig. B.14). Furthermore, the effect of wastewater properties has been assessed in a range of mesocosm-based wastewater studies (Kevill et al., 2021, unpublished). These found that the presence of suspended solids (turbidity range 10-400 NTU) or surfactants (0-100 mg/l) had minimal impact on RNA recovery using PEG or AS precipitation methods unless present at very high concentrations atypical of UK wastewater.
RT-qPCR can introduce additional variability at different steps during the quantification of SARS-CoV-2. Firstly, the reverse transcription can vary with the same samples by two to threefold depending on the amount and quality of RNA (Bustin et al., 2015). On the other hand, sample variability increases when the target complementary DNA (cDNA) is diluted, mainly when the quantification cycle (C q ) values are greater than 30. This is due to stochastic amplification, measurement uncertainty, and subsampling error (Taylor et al., 2019). The RT-qPCR variability can easily range between 10% and 200% of the coefficient of variation (CV) and can only be minimised by interrogating a larger proportion of the sample using more technical replicates and applying the average C q (Taylor et al., 2019). Fig. B.15 shows the variability of SARS-CoV 2 measurements in wastewater at different C q values from an England pilot study.  Fig. B.4. Per-site mean absolute percentage difference, compared to the 7-day baseline, in SARS-CoV-2 and other marker measurements when reducing the sampling frequency artificially for 186 network sites across England. Whilst the difference for ammonia, orthophosphate and pH is limited to ~ 10%, a difference of up to ~ 55% can be introduced in the mean estimate of SARS-CoV-2 when decreasing the cadence frequency.

Case study 3: population normalisation and measurement correction: lessons learned from WBE application in exposure studies beyond COVID
A study of multi-group chemical profiling in five contrasting urban populations, each served by a major STW contributing to one river catchment in South-West England and covering an area of approximately 2000 km 2 and a population of approximately 1.5 million (this constitutes > 75% of the overall population in the catchment) was undertaken to understand measurement variability at an inter-city granularity (See Fig. B.9 for a map of the five study locations, and Table B.1 for data on their network characteristics). A detailed discussion of multi-chemical fluxes in urban catchments has been provided by Proctor et al. (Proctor et al., 2021) and the methodology used to measure chemicals and back-calculate mass loads and intake are found in recent literature Proctor et al., 2019). Key contributing factors to WBE uncertainties are carefully considered and included in the study to enable a fully quantitative measurement of city-wide intake for selected chemicals: • Robust sampling and sample collection involving 24-h flow proportional sampling in ice packed or refrigerated autosamplers maintaining biomarker stability; • 7-day consecutive sampling to allow for temporal (weekday versus weekend) changes in biomarkers to be observed; • Robust wastewater flow measurement and population size estimates; • Fully validated analytical methods and the highest level of quality assurance (e.g. limits of detection and quantification, intra-and inter-day accuracy and precision, recovery from matrix); • Characteristic biomarker selection for back-calculation of chemical exposure (e.g. metabolite versus parent compound to account for direct disposal of unused chemicals); • Full biomarker mass balance in wastewater that accounts for biomarker presence in both solid and liquid phases with a full understanding of percentage biomarker recovered from the matrix.
The aim of the study is to understand and characterise key uncertainties to enable accurate back-calculation of city-wide exposure to chemicals. To validate the developed back-calculation protocol, highresolution spatiotemporal National Health Service (NHS) pharmaceutical prescription databases are used for system calibration, in terms of biomarker selection and its correction factor, as well as for overall spatiotemporal system performance evaluation. A detailed discussion on multi-chemical exposure can be found in Kasprzyk-Hordern et al. (2021). Here, focus is only given to carbamazepine and citalopram, two model chemicals, and two key variabilities for back-calculation of their usage at an inter-city level (that are not currently considered for UK SARS-CoV-2 monitoring): characteristic endogenous biomarker selection and establishment of correction factors accounting for human metabolism.
Carbamazepine intake (Fig. 5: red line) is back-calculated using both parent compound (source carbamazepine) and its metabolite (carbamazepine-10,11-epoxide, CBZ10-11). While both biomarkers correlate well with NHS prescription data (Fig. 5: blue line), using carbamazepine as a biomarker might lead to an overestimation of intake if direct disposal of unused carbamazepine takes place (see city A, Sunday, Fig. 5). Interestingly, this is not the case if CBZ10-11 is used (no spike in city A during Sunday), which indicates its superiority over carbamazepine itself.
An understanding of the extent of metabolism of biomarkers or metabolic formation of biomarkers is key in quantitative backcalculation of chemical intake. Fig. B.10 shows an example of a significant overestimation of citalopram intake observed when using commonly applied weighted average correction factors based on the existing literature. This often include only phase I metabolism of chemical excreted in urine (desmethylcitalopram in this case), as opposed to the focused approach, where metabolism correction factors (mCFs) are calculated using only comprehensive datasets from studies combining phase I and II metabolites (glucuronides) excreted in both urine and faeces. Understanding biomarker excretion in faeces is of critical importance for compounds with a more hydrophobic nature, such as citalopram as it is, to a large extent, excreted in faeces. Additionally, citalopram and its metabolites undergo extensive glucuronide conjugation. Overlooking excretion in faeces and phase II metabolism will lead to incorrect CFs as seen in Fig. B.10. Having prescription data per 10-100 households/postcodes allows for the validation of the correction factors used. Prescription databases (if associated with welldefined regional units such as streets) can therefore serve as internal calibration systems.
This case study shows the importance of careful biomarker selection to enable highly accurate 'quantitative' calculation of per capita stressor intake. This is not currently performed with SARS-CoV-2 surveillance, where the stressor itself is used as a biomarker. As a result, various sources of the genetic material present in the wastewater sample can be captured and, hence, calculation of the per capita intake (or viral load) may be difficult. Further work is required to establish a suite of The boxplots suggest a greater degree of within sample and between-method variability for Site 2 than Site 1, suggesting that combined sewerage systems (i.e. those receiving stormwater or industrial inflow in addition to domestic flow) may impart greater signal variability. Additionally, the lower SARS-CoV-2 measurements for grabs at Site 2, implies that autosampling is more likely to capture the target analyte signal in complex or dilute media.
biomarkers and new analytical approaches to enable quantitative measurement of community infection and public health indicators of concern. In the interim, it is likely that WBE can only be used as an early warning system for public health monitoring and verification of disease prevalence trends at the community level, and not as a quantitative measure of community infection rates.

Conclusions
The scale of the COVID-19 pandemic has resulted in an unparalleled response from a diverse community of stakeholders, working collaboratively to control and reduce the transmission and impact of the disease. The early demonstration that wastewater was a viable medium for  Fig. B.7. An anonymised heatmap view of 'core city' SARS-CoV-2 RNA concentrations measured in wastewater over a 1 month period from June -July 2021. Each row is an in-network sample location in the city and each column represents a sample day. Missing values represent a missing sample or no sample taken. Values are the log10 virus RNA concentrations (gene copies per litre). Cells with blue borders are flagged as likely being influenced by high dilution events, and < LOD are measurements below the laboratory limit of detection. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.).
tracking the virus, led to academic and government initiatives to operationalise wastewater-based epidemiology for monitoring its dynamics at local, regional, and national scales. In the UK, COVID-19 surveillance programmes across the four nations (England, Wales, Scotland, and Northern Ireland), have demonstrated, perhaps uniquely, the opportunity for WBE to be used routinely and at unprecedented scale to combat a public health emergency. From their inception, the national wastewater surveillance programmes have delivered insights to support    Fig. B.10. Citalopram intake calculated using citalopram and desmethylcitalopram, with and without inclusion of phase II metabolites (Note: (*) indicates no inclusion of phase II metabolism, which leads to overestimation of intake).
public health decision-making and to guide Government and key stakeholders in interpreting the measurements of SARS-CoV-2 in wastewater to provide a broader understanding of the disease in the populations. This work has allowed for a broader appreciation of WBE as a tool for monitoring public health in populations at scale, with initiatives likely to focus on a 'beyond COVID' uplift of WBE as part of establishing One Health programmes across the world. However, their effectiveness requires that the data generated to support the function of WBE is meaningful and representative of the target(s) being monitored. Wastewater is a more complex environment than typical media used for monitoring of human health, with multiple factors potentially accounting for greater uncertainty or variability in the measured signal that in, for example, a clinical setting. Managing this uncertainty is one of the key challenges to ensure successful employment of WBE for public health protection.
Here perspectives are provided on the confidence in wastewater-derived measurements by those working across the national programmes, given work performed to understand, quantify and manage measurement uncertainty and variability. The work emphasises that while some sources of uncertainty may not be impactful, or can be adequately accounted for (e.g. extraneous flow dilution, sampling method), other sources are inconsistent or difficult to quantify directly (e.g. shedding distributions, in-network behaviour). While these intractable factors will, with consolidated research efforts, become less opaque, there is unlikely to be a general approach to manage measurement uncertainty for all applications of WBE beyond COVID. Making use of the greatly increased capacity for WBE in the UK, and more widely, will require new methods for extracting actionable information from wastewater data, but also methods for determining the limits of its application. The lines show the fitted regression estimates: blue is for the full random coefficient model and red is for the model with the slope for log ammonia fixed at − 1. The strength of the relationship varies between sites, as shown by the correlations given. At some sites (e.g. Lockerbie), the fitted lines are quite close, and in other cases (e.g. Shieldhall), the difference is more marked. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.).

Funding
The United Kingdom Government (Department of Health and Social Care) fund the sampling, testing, and data analysis of wastewater in England. The Scottish Government (Rural & Environmental Science and Analytical Services division, Covid Testing in the Health and Social Care division) funds the sampling, testing, and data analysis of wastewater in Scotland. The Welsh Government (Technical Advisory Cell) funds the sampling, testing, and data analysis of Wastewater in Wales. The initial phase of the "SARS-CoV-2 wastewater surveillance and reporting SARS-CoV-2 in Northern Ireland project" was funded under the joint Science Foundation Ireland -Department of Agriculture, Environment and Rural Affairs (DAERA) COVID-19 Rapid Response Funding Call (20/COV/ 8460-1). The second stage of the project is funded by DAERA in collaboration with the Public Health Agency NI (PHA-NI). The project partners include DAERA, PHA-NI, the NI Environment Agency, Department of Health, Department for Infrastructure, Belfast City Council and NI Water Ltd. We would also like to acknowledge NI Water   Fig. B.14. Inhibition level of clean samples spiked with synthetic single-stranded RNA (ssRNA). The inhibition level was calculated by spiking ssRNA into wastewater extracts and comparing the measured C q to RNA spiked into molecular negatives (no template controls). The modified PEG method keep the inhibition level below 1.0 C q and the RNA quality between 2.0 and 2.2. Fig. B.15. SARS-CoV-2 N1 gene variability between biological duplicates. The C q variability increased with lower target concentrations (higher C q ). The CV was 1.0 ± 0.9 and 2.6 ± 2.3 for samples with C q values below and above the LOQ, respectively.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Disclosure statement
Environmental Monitoring for Health Protection [MJW, ALJ, MRB, JTB, JG, TH, CJL, MM, CSo, CSw]: The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the UK Health Security Agency.

Appendix A. COVID-19 wastewater surveillance in the United Kingdom
Wastewater monitoring of COVID-19 across the four nations of the United Kingdom is the most striking example of an application of WBE for public health surveillance to date, given the scale and extent of the sampling programmes, and end-to-end transformation of measurements to actionable insights. Here, a description of each national programme is provided to provide context for understanding the challenges around measurement uncertainty and variability.

Appendix A.1. England
Sampling of wastewater in England is being carried out by the Environment Agency, UK water utilities, and the Environmental Monitoring for Health Protection team, part of the Joint Biosecurity Centre (UK Health Security Agency, as of October 2021) created to support government response to the COVID-19 pandemic. Sample collection started in June 2020 at the inlets to 44 sewage treatment works (STWs). The sites were selected to provide good population coverage and geographical representation across the country. In total, the original sites covered 17.7 million people (over 31% of the population of England). The sampling capacity was increased considerably at the start of 2021 and, as of July 2021, comprised 556 sites, including 263 STWs, 238 network sites (manholes or pumping stations in the sewer catchment), and 55 near-source sites (single or groups of buildings). The sites are distributed across the networks of the nine water utilities in England. By July 2021, wastewater sampling covered 39.4 million people (70% of the population of England). Recent sites were selected according to multiple criteria including demographic disease risk, population coverage and, for in-network sites, access points (e.g. manholes, pumping stations) that ensure safe access and well-mixed samples. STWs are sampled four times per week, using either autosamplers (composite) or by grab, or spot, sampling, post influent screening. The method of sampling is typically dependent on infrastructure at the STWs. An example of the potential for large variability in wastewater concentration between samples taken, either as grab or composite, is shown in Fig. B.3. In-network samples, collected upstream of the STWs, are used to constrain areas of concern in nine 'core' cities representing the largest conurbations in England, and three smaller strategic cities (based on historical COVID-19 trends). Samples from network sites are collected daily, mostly as grab samples, while near-source samples are largely taken using autosamplers at a fixed sub-sampling frequency (See Fig. B.1 for an overview, as of July 2021).

Appendix A.2. Scotland
Sampling and testing of wastewater across Scotland has been performed by Scottish Water and the Scottish Environmental Protection Agency (SEPA). In total there are over 1800 STWs in the Scottish Network, serving from fewer than 100 people to populations over 600,000. Initial testing of the premise that SARS-CoV-2 virus fragments were detectable in wastewater began in April 2020 with the development of a national monitoring programme operationalised by late May 2020. A network of 28 sites were initially prioritised, designed to maximise the coverage of population across Scotland's 14 National Health Service Health Regions, while also ensuring that laboratory capacity was available at that time. The 28 sites covered a total of 2.6 million people (just over half of the five million sewered population in Scotland), with the goal of achieving a coverage of 40% in each of the 14 regions. As the need for wastewater monitoring has increased, so too has the monitoring network, which has expanded to 108 sites covering 4.2 million people. Autosamplers are used to obtain composite samples from the influent at each sewage works over a 24-h period, which are then sent to SEPA for analysis. Results are then published via data visualisation dashboards. One dashboard, designed for the general public, holds the raw virus concentrations for each site (https://informatics. sepa.org.uk/RNAmonitoring), while a second dashboard, designed for public health officials, has additional metrics and comparisons to reported case numbers.
Additionally, Scottish Water, SEPA and a variety of NHS and Public Health professionals from across Scotland have been working together to collect and analyse samples from within the sewer network itself (via manholes). These were sited at the request of health professionals in order to better understand the virus prevalence in areas of concern within larger sewer networks. These samples are taken by means of a grab sample. In total, 28 such sites have been monitored at one time with 14 still being active as of July 2021.
In total, over 5000 samples have been tested and recorded as of July 2021. The sampling frequency varies between sites depending on several factors and has changed at different times as the needs of stakeholders has changed over time. Sampling at treatment work inlets has been variable, with the majority sampled at a frequency of once or twice a week, but some as much as four times per week, a trade-off between lab capacity and data density. All in-network samples are monitored five times per week in their initial week to establish a baseline before being sampled twice a week thereafter.

Appendix A.3. Wales
The Wales wastewater monitoring programme started as a pilot in March 2020 as the first wave of COVID-19 spread across the UK . This early work highlighted the potential for tracking SARS-CoV-2 and also led to the development of robust methodologies for extracting and quantifying the virus in wastewater (Farkas et al., 2021). This pilot phase was then expanded in September 2020-20 sites across the country. These sites were initially sampled three times per week, increasing to five weekdays by June 2021, to try and reduce the variability in the wastewater SARS-CoV-2 RNA signal and, thus, improve its usability. One of the major challenges in Wales has been the lack of on-site infrastructure needed to take composite wastewater samples. Therefore, all samples are currently taken as grab samples, targeted at the early morning wastewater peak (between 08:00 and 11:00 h). However, it is now known from deploying the enveloped Pseudomonas virus, phi6, into the sewer network that this approach may miss the effluent peak, leading to an underestimation of viral abundance. Another challenge has been the poor geographical coverage in Wales. The country has two urban corridors centred around the northern and southern coasts in which 80% of the population resides. Consequently, wastewater surveillance has focused in these areas, leaving ca. 20% of the country, mainly in central and western Wales unmonitored, resulting in uncaptured localised outbreaks in small urban centres. Another major issue is that the capital city, Cardiff, is served by a very large centralised STW (930,000 people). Although this captures 30% of the Welsh population in one sample, the lack of granularity prevents the potential for using wastewater surveillance to target regions of the city to control localised COVID-19 outbreaks (e.g. implementation of surge testing and walk-in vaccination centres). The lack of sampling on weekends also prevents capturing of the large migration of tourists from North West England into North Wales. The wastewater samples taken in Wales were also used to pilot their potential to track other viruses of public health interest (e.g. influenza A and B, norovirus, respiratory syncytial virus, enterovirus D68). Analysis showed that wastewater contained all these viruses with the exception of Enterovirus D68. Looking forward, the Wales wastewater surveillance programme is now being expanded to many more sites with the aim to capture 90% of the population connected to sewers, with analysis of a greater number of public health indicators.

Appendix A.4. Northern Ireland
Wastewater surveillance in Northern Ireland (NI) has several unique challenges compared to other parts of the UK, which are related to the urban and rural distribution of population. NI has an extensive wastewater treatment network operated by Northern Ireland Water (NI Water). In total, there are 1114 STWs in the NI Water network, serving just under 80% of the NI population. Each STW serves a wastewater drainage catchment area of variable sizes. Up to 68% of the NI Population is served by the 40 largest STWs. However, these larger STWs serve predominantly urban, as opposed to rural communities, and tend to be disproportionately located in eastern parts of NI. The integrated wastewater testing and geographic surveillance programme for SARS-CoV-2 in NI is led by Queen's University Belfast, funded by the Department of Agriculture, Environment and Rural Affairs (DAERA) in collaboration with the Public Health Agency NI (PHA-NI).
Currently there are SARS-CoV-2 wastewater samples being taken at 14 sampling sites at STWs covering 35.3% of the NI population. The current sampling strategy was based on several key factors, including population coverage, geographic distribution of wastewater surveillance and a close alignment and agile response to PHA-NI test and trace results. Consideration is being given to significantly expanding the sampling sites, allowing for the wastewater surveillance of a significant portion of the NI population.
An important aspect of the approach in NI has been the use of Geographical Information Systems (GIS) to develop spatial GIS-based wastewater monitoring and reporting system integrating public health data to model population geographies and align with wastewater drainage catchment areas. Modelling population across NI using GIS provides an approach to estimate populations covered by the wastewater network, the population within individual wastewater drainage catchment areas, and an estimate of how to balance capturing the maximum percentage of the population from a relatively limited number of sample sites, while ensuring an adequate geographic spread across NI. This has been achieved through the development of an interactive wastewater SARS-CoV-2 Surveillance Dashboard. The Dashboard provides a display of the analysis results of sampling at various locations in NI and enables users to see and understand population distribution modelling across NI wastewater network. This offers the most efficient and informative sampling strategy for the programme, and an approach to contextualise wastewater test results in terms of socio-economic deprivation.
An indication of the extent of wastewater surveillance across all four regions, indicating the geospatial locations of sampling sites (as of July 2021), is shown in Fig. B

. Identifying and reconciling anomalously low measurements in England
As discussed, the devolved administration programmes for COVID-19 wastewater surveillance have generated a large number of SARS-CoV-2 virus RNA measurements. As with all environmental measurements, the signal recovered will be subject to anomalies, or outliers, that diverge from the expected data trends, with some defined statistical significance. Measurements can vary by several orders of magnitude, with extrema possibly representing unaccountable occurrences such as ' 'super-spreader' events, single release of highly concentrated sewage (e. g. transported from non-networked sites, in-network holding tanks or wet wells), or due to sample capture of a highly aggregated, unmixed load. Alternatively, anomalies may represent measurement error or uncertainty due, for example, to inappropriate sampling frequency, miscalculation or unknown peak flow (for grab samples), or sample/ laboratory contamination. Such data anomalies can cause many problems for further analysis or visualisation, and depending on context, different interventions are typically needed when they are detected.
In the English programme, post-laboratory analyses were conducted to attempt to identify measurements that may be anomalously low, by defining the likelihood that a measurement falls within some expectation criteria. In particular, a machine learning approach, using a Gradient Boosting for regression model, was trained with a quantile loss function to predict 90% SARS-CoV-2 concentration intervals at the sampled sites. These predictions were used to explore unexpectedly low data points (below the 5th percentile prediction interval) where similar sites in terms of geography and collection method exhibited relatively high measurements.
The analysis identified 762 samples as anomalous out of 25,957 that did not report a quantified value. In particular, the model highlighted low measurements during January and February 2021 despite infection rates across the country were high. The analysis could be extended to explore any recorded values that do not fall within the predicted range, whether low or high. Fig. B.12 illustrates the frequency of anomalous data points when compared to ammonia concentrations, suggesting that lower concentrations of ammonia are associated with a higher proportion of unquantifiable samples. This suggests that flow dilution has the impact of reducing SARS-CoV-2 concentrations below the 5th percentile prediction interval.

Appendix B.5.2. Tracking measurement outliers in England
On a weekly basis, sites with rapid and sudden increase of SARS-CoV-2 are identified using parametric confidence bands around a linear regression model fit to the data. The model is used to predict the SARS-CoV-2 concentrations 7-days in advance and a 80% confidence band is calculated for this extended linear regression. This accounts for the uncertainty of the mean virus RNA concentrations over time. As new data is acquired, if the latest measurement falls outside the upper limit it indicates that the sample has exceeded the predicted concentration and needs further investigation. Outliers identified with this method are visualised on a map of England and assessed alongside appropriate meta-data, such as the inorganics (e.g., ammonia, orthophosphate), see Fig. B.13. Weekly maximums that lie above pre-defined threshold values are also flagged as outliers. After following this process, sites of concern are reported to the National Laboratory Service (NLS) who conducts further quality assurance.

Appendix B.5.3. Identifying and reconciling anomalously high measurements in Scotland
Under the Scottish programme, Biomathematics and Statistics Scotland (BioSS) conducted a similar procedure, though instead the focus was on anomalously high values (e.g. spikes), with the aim of flagging and potentially removing anomalies as soon as they are recorded. A Generalised Additive Model (GAM) was used to identify when high amounts of wastewater COVID-19 (relative to case rates, or relative to the previous variability of the site) is indicative of the wastewater measure not corresponding to future cases. With a suitable threshold, this was used to remove these measurements from aggregates, and/or trigger further investigation (Fang, 2021).

Appendix A. Supporting information
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.jhazmat.2021.127456.