Systematic review of wastewater surveillance of antimicrobial resistance in human populations

Objectives: We systematically reviewed studies using wastewater for AMR surveillance in human populations, to determine: (i) evidence of concordance between wastewater-human AMR prevalence estimates, and (ii) methodological approaches which optimised identifying such an association, and which could be recommended as standard. We used Lin ’ s concordance correlation coefficient (CCC) to quantify concordance between AMR prevalence estimates in wastewater and human compartments (where CCC = 1 reflects perfect concordance), and logistic regression to identify study features (e


Introduction
Antimicrobial resistance (AMR) is a significant threat to global health (O'Neill, 2016) and a multi-faceted problem compounded by diverse drivers facilitating its emergence and spread.AMR surveillance is critical to understanding trends, monitoring interventions and developing empiric treatment guidelines, as prioritised in the World Health Organisation's global AMR action plan (WHO, 2019).Large networks sharing AMR data have been established to meet this need, including the European Antimicrobial Resistance Surveillance Network (EARS-Net) and the Global Antimicrobial Resistance Surveillance System (GLASS).However, current surveillance can be limited by the reliance on individual-level sampling, which is often affected by selection bias towards healthcare-associated settings (WHO, 2018).For example, both EARS-Net and GLASS target AMR in clinical specimens from hospitalised patients; this however does not reliably capture AMR prevalence in commensal organisms, thought to silently constitute most of the true AMR burden (Fahrenfeld and Bisceglia, 2016;Hay et al., 2018;Hendriksen et al., 2019b).Additionally, data collection is often limited to a subset of culturable species, and on susceptibility phenotypes rather

Table 1
Methodological features potentially contributing to variability in outcomes.

Methodological features Examples of methodological feature Aspects potentially introducing variability in outcomes References
Wastewater sampling point type Wastewater treatment works (WWTW) sampling point e.g.influent versus effluent Treatment processes can transform microbial and AMR composition resulting in differences between e.g.influent and effluent samples Tong et al., 2019;Zhang et al., 2020 • Hospital effluent • Long conveyance times from population to sampling point may impact composition due to transformation in unique sewer environment (anaerobic, temperature, biofilms) • Presence of pre-treatment infrastructure (e.g.pumping stations, balancing tanks) may also play a role in transforming wastewater Fahrenfeld and Bisceglia, 2016 Treatment methods When sampling treated wastewater, differing levels of treatment can selectively transform AMR and microbial composition Tong et al., 2019;Zhang et al., 2020 Geography and weather (seasons, rainfall, temperature, latitude) • Heavy rainfall dilutes wastewater in combined sewer systems via rainwater runoff and by infiltration of groundwater (dislodged biofilms, freshwater taxa) • Local ambient environment and climate can influence both humanassociated microorganisms entering the system and resident sewer microbiota Shanks et al., 2013 Flow rate • Combined sewer overflows impact composition of post-treatment samples collected during these events than AMR genotypes.This lack of genotyping hampers the surveillance of high-risk AMR-associated clones and specific AMR-associated genetic determinants (Tacconelli et al., 2018).In many settings, particularly low-and middle-income countries (LMICs) where the burden of AMR is largest, the laboratory infrastructure to support individual, patient-level surveillance is lacking.
Wastewater-based epidemiology (WBE) is an epidemiological approach based on the analyses of wastewater (e.g.sewage) to generate information on human populations on a community scale (Choi et al., 2018).WBE has the potential to overcome some of the aforementioned challenges by simultaneously sampling both healthcare-and community-associated populations at scale (Newton et al., 2015).The approach has already been successful in illicit drug monitoring (González-Mariño et al., 2020) and pathogen surveillance (Asghar et al., 2014;Fernández et al., 2012), including SARS-CoV-2 (Ahmed et al., 2020a,b), and its application to AMR surveillance is gaining traction (Fahrenfeld and Bisceglia, 2016).Difficulties in standardising AMR detection methods and targets across surveillance networks (Tacconelli et al., 2018) could potentially be circumvented by using metagenomics to agnostically probe wastewater resistomes (Aarestrup and Woolhouse, 2020;Wright, 2007).Recent wastewater-based studies have investigated seasonal/geographic AMR distributions (Su et al., 2017), quantified global AMR gene abundance (Hendriksen et al., 2019b) and identified associations between AMR in wastewater and clinical samples (Karkman et al., 2020;Pärnänen et al., 2019).However, heterogeneous study designs and methods likely contribute to differences in outcomes/interpretations.The impact of methodological approaches such as grab sampling (i.e.taking single samples at a single timepoint) (Reinthaler et al., 2013), snapshot versus longitudinal study design, sampling in the presence of unrepresentative and "contaminating" AMR-associated point sources, and/or characterising AMR based on phenotypic testing of isolates versus genotypic profiling remains poorly understood (Table 1).
Despite the increasing use of WBE for AMR surveillance/evaluation purposes, there has been no attempt to review the available data, synthesise the evidence, and assess remaining knowledge gaps.We therefore systematically reviewed studies using wastewater (i.e. the "wastewater compartment") for AMR surveillance in human populations (i.e. the "human compartment"), seeking firstly to characterise the strength of the AMR prevalence associations observed between wastewater and human compartments to identify whether this appears to be a promising surveillance approach.Secondly, we sought to identify methodological factors that might optimise these associations in support of a standardised approach for wastewater-based AMR surveillance going forwards.We specifically focussed on study design and methodological approaches, including AMR detection methods, highlighting limitations and recommendations for future work.

Materials and methods
For this systematic review, we sought firstly to evaluate concordance between wastewater and human AMR prevalence estimates for each study, stratified by the AMR detection method used (i.e.phenotypic versus genotypic).Secondly, we adapted the PECOTS (Population, Exposure/Intervention, Comparator, Outcome, Target Condition, Study Design) systematic review framework and formulated the following statement to assess association between study methods and outcomes: Among studies jointly evaluating AMR prevalence in wastewater and humans, what is the effect of methodological approaches (e.g.wastewater sampling methods, AMR detection methods) on the concordance between these metrics?
A PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist is included in (Supplementary dataset 1), and the complete PROSPERO (International prospective register of systematic reviews) protocol is available at: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42019134946.

Literature search
The search string was developed through iterative preliminary searches in consultation with a librarian experienced with systematic reviews.Full search strings adapted for each database are presented in (Supplementary dataset 2).Searches were conducted on 01/02/2019 in: MEDLINE (National Library of Medicine), EMBASE (Excerpta Medica dataBASE) Global Health, CAB Abstracts, Scopus and Web of Science Core Collection.Searches were updated on 09/01/2021 using identical search strings.Results were limited to the English language and deduplicated.

Eligibility criteria
Records were assessed through a two-stage screen detailed in (Fig. S1, Supplementary dataset 3) to capture both studies piloting wastewater-based AMR surveillance and studies conducting relevant wastewater-human AMR comparisons.Briefly, stage one screened titles/ abstracts to determine if the study was: (i) primary research conducted by the author/s, (ii) investigated wastewater which at least in part was constituted of human waste, (iii) reported AMR prevalence as a result of the work/analyses undertaken, and (iv) performed comparative analyses of the investigated wastewater to a separate non-wastewater compartment which potentially represents AMR prevalence in a human population.If it was unclear whether a study met criteria based on title and abstract alone, the study was passed onto the next stage.Stage two reviewed full-text methods and studies included if: (i) the wastewater analysed originated from at least one conventional WWTW where multiple waste streams converge, and (ii) the compartment being used as a comparator to wastewater AMR prevalence must directly represent a human population such as in clinical isolates or resistance network data.A universal inclusion question was used to include studies explicitly performing wastewater-based surveillance of human AMR irrespective of meeting stage one or two criteria.For full descriptions of screening criteria and examples of excluded records see (Fig. S1, Supplementary dataset 3).

Study selection and data extraction
Records were independently screened in duplicate (by the authors KKC and LB) and data was extracted from included records using a pretested data extraction form piloted on five random included records (Supplementary dataset 4).This form consisted of both predetermined fields using data validation and non-restricted write in fields to record data relating to (non-exhaustive): study design (wastewater sampling strategy, WWTW metadata, human sample type, sample sizes), methods (wastewater sampling, AMR detection, statistical methods) and outcomes (reported wastewater-human comparison results).If available, raw resistance prevalence data (total and resistant isolate counts regarding individual antibiotics or resistance genes) were extracted for antibiotics on the WHO critically important antimicrobials (CIAs) list (AGISAR, 2018).These counts were used to calculate point estimates (±95% confidence intervals) representing the proportion of AMR isolates amongst all isolates tested for either wastewater or human compartments; this metric is referred to as either the wastewater or human AMR prevalence from here on.

Risk of bias and certainty assessment
Risk of bias was assessed independently by two reviewers (the authors KKC and LB) using a qualitative approach based on the Cochrane risk of bias tool (Higgins et al., 2011); our modified tool focused on systematic differences at the study level as outcomes reported were highly diverse.Modified interpretations of five bias domains are detailed in Table S2; a brief summary of our interpretation of each bias domain is provided here.Attrition bias referred to differences introduced by missing data (e.g.missing sampling timepoints in longitudinal studies).Performance bias referred to differences introduced by use of different methods between or within sampling compartments (i.e.AMR detection methods).Reporting bias referred to selective reporting where outcomes were measured but not reported or disproportionately reported in the text.Selection bias referred to differences introduced across the wastewater compartment by use of different wastewater sampling or processing methods (e.g.selecting more colonies for susceptibility testing from one WWTW compared to another).Other bias referred to a lack of acknowledgement of AMR-influencing wastewater inputs (i.e.unreported sewer inputs from healthcare, abattoir or agricultural sources), which may have modulated AMR profiles in sampled wastewater.If there was insufficient information present to assess the risk of bias this was denoted as "unclear".The rationale for risk of bias assessments was recorded, including examples where applicable.Discrepancies in risk of bias assessments were resolved by discussion.An overall qualitative measure (high, low and unclear) was assigned to each study as per the Cochrane risk of bias tool approach to summary assessment (Higgins et al., 2011).
Certainty assessment (assessment of overall confidence in the evidence included) was conducted using an adaptation (Woodruff and Sutton, 2011) of the GRADE (Grading of Recommendations, Assessment, Development and Evaluations) system designed for clinical studies (Morgan et al., 2016), where evidence is given an initial rating ("high", "moderate", "low", "very low") then upgraded or downgraded based on study characteristics.The outcome being evaluated was whether or not high concordance between AMR prevalence estimates in wastewater and human compartments was observed.Conventionally, GRADE assigns "high" and "low" initial ratings to randomised trials and observational studies respectively.A randomised controlled trial design is of limited applicability to the studies being evaluated as part of this review, and therefore using the adapted version of GRADE (Woodruff and Sutton, 2011), we assigned initial "moderate" ratings to our body of evidence.Upgrading or downgrading was based on a subset of adapted GRADE domains (Table S3).Briefly, these included our risk of bias summary assessment, inconsistency, indirectness, imprecision, publication bias and concordance (all defined in Table S3).In our adaptations we omitted two of the original GRADE criteria ("large effects", "residual confounding"), as these could not be readily adapted to our context.

Data synthesis and analysis
For extracted resistance prevalence data, we used Lin's concordance correlation coefficient (CCC -R package DescTools) with 95% confidence intervals (CIs) to quantify the concordance between the proportion of resistant wastewater isolates (i.e.wastewater AMR prevalence) and the proportion of resistant human isolates (i.e.human AMR prevalence), with the latter representing the reference standard.As perfect concordance is unrealistic, we arbitrarily defined "high concordance" to represent a ±10% difference in AMR prevalence between wastewater and human compartments.Concordance was considered separately for comparisons derived from phenotypic versus genotypic approaches.In addition to evaluating concordance across all studies, we also evaluated concordance stratified by bacterial species, resistance to specific antibiotic classes and for AMR gene families.Since Lin's CCC does not reflect error in AMR prevalence estimates, we also compared Clopper-Pearson 95% confidence intervals (CIs) stratified by study, antibiotic class or AMR gene.
As we aimed to identify approaches that could optimise wastewaterbased AMR surveillance, we classified studies based on wastewaterhuman AMR concordance where a "high agreement" study was defined as a study with >70% of its wastewater and human AMR prevalence estimate comparisons being within ±10% of each other (i.e.highly concordant comparisons, see above).We then used logistic regression to identify if any study features were associated with this high agreement classification in STATA/IC v.16.1 (StataCorp, College Station, USA).As study feature reporting was highly inconsistent, we only tested features that were reported by at least 75% of studies.
In addition, given the heterogeneity of study features, their inconsistent reporting across studies, and the small number of studies limiting power to detect associations, we descriptively synthesised features potentially associated with "high agreement" studies (i.e.where >70% of wastewater and human AMR prevalence estimate comparisons were within ±10% of each other).For this, in addition to the high agreement category, we further classified studies as moderate agreement (30-70% of wastewater and human AMR prevalence estimate comparisons being within ±10% of each other) and low agreement studies (<30% of wastewater and human AMR prevalence estimate comparisons being within ±10% of each other).Agreement classifications were considered separately for comparisons derived from phenotypic versus genotypic approaches.

Literature screen
Of 8,867 de-duplicated studies identified using our search strategy, full-text methods for 441 relevant studies were reviewed, and based on pre-specified inclusion criteria (see Methods), 33 studies were included in the review (Fig. 1 and Fig. S1).

Risk of bias and certainty assessment
Based on our modified bias domains (Table S2), 19/33 studies were judged to have overall high-risk of bias, 7/33 with an unclear-risk and 7/ 33 with low-risk (Supplementary dataset 5).To avoid splitting our analyses and losing statistical power, we present a synthesis of all studies and provide a summary of the risk of bias across studies below (as recommended by Higgins et al., 2011 when the study pool is small and stratified analyses according to risk categories is not feasible).
Based on our certainty assessment, we rated the overall quality of included bodies of evidence as "low to moderate" regarding the outcome of identifying concordance between wastewater and human AMR prevalence estimates (Table S4).Initial "moderate" ratings were downgraded due to high overall risk of bias and the indirectness of study aims in the context of our review questions.Upgrades were warranted by statistically significant concordance (adapted GRADE "doseresponse" assessment) across most studies resulting in the final "low to moderate" rating of the evidence base on which the following data synthesis and recommendations are made.

Summary of general study characteristics
Studies explicitly using wastewater for human population-level AMR surveillance made up 12/33 included studies.The remaining 21 studies K.K. Chau et al. included relevant comparisons between wastewater and human AMR, but were not directly set up as wastewater-based AMR surveillance studies (e.g.studies focussed on One Health).Amongst the 33 included studies, 73 unique countries were sampled, although most (48/73) were represented as part of a single global study (Hendriksen et al., 2019b) (Fig. 2).World Bank regions covered by the studies were as follows: East Asia and Pacific (n = 3 studies), Europe and Central Asia (n = 19), Latin America and the Caribbean (n = 4), Middle East and North Africa (n = 9), North America (n = 5), South Asia (n = 2), Sub-Saharan Africa (n = 4).World Bank income classifications showed a sampling skew towards high-income countries (high income [n = 24 studies]; middle income [n = 11] and low income [n = 3]) (Fig. S5).Three studies covered multiple regions and income classifications.Publication dates ranged from 2007 to 2020, with most published in the last three years (n = 24) (for full study descriptions, see Supplementary data sets 6 and 7).

Summary of AMR detection methods in included studies
Evaluations of AMR were undertaken using genotypic-only methods (n = 7), phenotypic-only methods (n = 8), or a mixed approach combining both (n = 18).Genotypic-only studies employed metagenomics (n = 4), qPCR (n = 2) and single isolate whole genome sequencing (WGS) (n = 1).Phenotypic-only studies employed diskdiffusion (n = 4), microbroth dilution (n = 3), or both (n = 1).Mixed approach studies combined disk-diffusion/microbroth dilution with qPCR (n = 1), PCR (n = 9) or single isolate WGS (n = 8).For data synthesis and analysis across studies, relevant phenotypic data was extracted from 22 studies and genotypic data from 12 studies (24 studies in total as both data types could be extracted from 10 studies).These 24 studies conducted phenotypic-only (n = 7), genotypic-only (n = 1) or combined (n = 16) AMR detection.All extracted genotypic data consisted of isolate-level qPCR, PCR or WGS; no metagenomic data at the sample-level was synthesised.For nine studies there were no relevant data that could be extracted for inclusion in a combined summary; these were either where isolate counts were not reported, or only raw sequencing data was available which was beyond the scope of our analysis.
Hutinel et al., 2019 and Ojer-Usoz et al., 2017 contributed the most highly concordant comparisons (13/16 [81%] and 14/22 [64%] withinstudy comparisons respectively); however only the Hutinel study was therefore defined as a "high agreement" study.Most phenotypic comparisons utilised isolates cultured from human samples originating from healthcare settings (135/139 comparisons), with higher AMR Fig. 2. Geographic distribution of wastewater sampling and test approach of included studies.Centroids of countries sampled by included studies are plotted with colours and shapes according to citation and antimicrobial susceptibility test (AST) approach respectively.Centroid are plotted with jitter to avoid overplotting and do not represent exact sampling locations within countries.
prevalence than corresponding wastewater estimates (in 86/135 (64%) comparisons).The remaining four comparisons (all from Haghi et al. 2019) were unique in comparing fecal isolates from healthy volunteers with no recent history of antibiotic use to wastewater; all four showed reduced AMR prevalence in the human isolates.Sensitivity analysis using data from low bias studies only (n = 5; 31 comparisons) showed a slight decrease in overall concordance (CCC = 0.81 [95% CI 0.65-0.9];95% CI overlaps with that of the full phenotypic dataset described above) (Fig. S6).

Genotypic wastewater-human AMR concordance
Extracted genotypic data (single isolate WGS and PCR) from 12 studies comprised 245 comparisons between AMR gene prevalence estimates between wastewater and human compartments.Correlation between compartments was slightly higher than for phenotypic data (CCC = 0.88 (95% CI 0.84-0.9))and overall spread away from perfect concordance was reduced (Fig. 3B), with high concordance (wastewater AMR prevalence within ±10% of human AMR prevalence) in 179/245 (73%) comparisons.The median number of comparisons (i.e.AMR prevalence for a specific species-AMR gene combination across both wastewater and human compartments) per study was 11 (IQR: 5-23).For any comparison, the median number of isolates analysed in humans was 94 (IQR: 25-437), and in wastewater 56 .
For a subset of eight genes conferring resistance to WHO CIAs and investigated by multiple studies, 95% CIs around prevalence estimates in both compartments overlapped as follows: (i) aac (13/16 (81%) comparisons; Fig. S16 [

Logistic regression of study features and wastewater-human AMR agreement
If more than 70% of wastewater-human AMR prevalence estimate comparisons conducted by a study were highly concordant (i.e.prevalence estimates within ±10% of one another), studies were classed as high agreement overall; this was the case for 6/22 studies (27%) with phenotypic data and 5/12 (42%) studies with genotypic data.Of the ten studies with both phenotypic and genotypic data available only 1/10 showed high agreement for both approaches (Adator et al., 2020b); the remaining 9/10 either showed high agreement for only one approach (3/10) or neither approach (6/10).No statistically significant associations between study features and higher wastewater-human AMR concordance were identified (Table S24).The limited number of eligible studies, the substantial heterogeneity of combinations of approaches deployed across studies, and missingness of data meant power to detect independent associations was low (Table S25).

Study features descriptive synthesis
We therefore synthesised study features descriptively, additionally assigning studies to moderate (30-70% of wastewater and human AMR prevalence estimate comparisons being within ±10% of each other) and low agreement categories (<30% of wastewater and human AMR prevalence estimate comparisons being within ±10% of each other), and treating phenotypic-and genotypic-approaches as separate study subsets (Table S26).Sampling of influent, either alone or in conjunction with effluent, appeared most consistently associated with moderate-/ high-agreement (15/28; 54% where reported) in estimates of AMR prevalence between wastewater and human compartments.Longitudinal sampling was the most common study design (29/32; 91% where reported) with moderate-/high-agreement in most (22/29; 76%).The four studies undertaking snapshot (i.e.single-timepoint) sampling all had moderate-/high-agreement whereas the single study conducting a mixed sampling design was associated with low agreement.For longitudinal studies, the timeframe of sampling was potentially relevant: of the eight low-agreement longitudinal studies, 7/8 sampled for ≤12 months and only 1/8 for >12 months.Conversely, 14/20 medium/highagreement longitudinal studies sampled for >12 months, and only 2/20 for <6 months.Studies deployed several wastewater sampling methods, with grab and flow-proportional sampling studies equally distributed across agreement categories (8/12 [67%] and 4/6 [67%] moderate-/ high-agreement respectively) but interestingly, composite sampling was more associated with moderate-/high-agreement (8/9; 89%).Of note, sampling point or method was not reported by six and five studies respectively.Most studies performed comparisons on wastewater at least in part derived from the human population sampled (i.e.direct comparisons, 18/34; 53%), while eight conducted indirect comparisons, one conducted both and seven were unclear/unreported.Most moderate-/high-agreement studies conducted direct comparisons (14/21; 67% where reported).Lastly, studies investigating 1-2 WWTWs made up the majority (22/34) but were similarly associated to moderate-/highagreement (17/22; 77%) as those investigating ≥3 sites (7/10; 70%).

Studies without extractable data
The nine studies without extractable data that could be synthesised are summarised below in terms of their overall ability to detect wastewater-human AMR associations based on reported conclusions.Full descriptive summaries including study details and specific findings are in supplementary dataset 8.
Two studies performed direct AMR gene detection using qPCR of either 229 (Pärnänen et al., 2019) or eight AMR genes (Colomer-Lluch et al., 2014); both reported a relationship between wastewater AMR and national AMR data.
Four studies employed metagenomics to identify potential wastewater-human AMR associations.Two of these studies appeared to demonstrate an association while the other two were inconclusive.
Two studies used mixed approaches combining phenotypic AST with qPCR (2-targets) (Meir-Gruber et al., 2016) and single isolate WGS (Gouliouris et al., 2019); one study used a phenotypic approach only (YoungKeun et al., 2015).All three studies appeared to show a wastewater-human AMR association.

Discussion
From our review and synthesis of the available data, we found characterisation of AMR in wastewater shows promise in reflecting AMR in human populations, irrespective of diverse target species, target resistances and study locations, although associations may be stronger for some species and AMR mechanisms than others, and may vary by setting and over time.The strength of this relationship varied across studies and was likely influenced by study features (e.g.design, setting, spatiotemporal sampling strategies) and AMR detection method (i.e.genotypic/phenotypic); the heterogeneity of methodological approaches and lack of clear reporting of key study features made any quantitative synthesis very difficult.

Effect of AMR detection method on wastewater-human AMR concordance
Our estimates of concordance (Fig. 3A, 3B) supporting a wastewaterhuman AMR correlation are in line with estimates in individual studies (Huijbers et al., 2020;Hutinel et al., 2019;Karkman et al., 2020).In particular, Huijbers et al., 2020 reported coefficients of determination as 0.62-0.72 for individual antibiotics and 0.85 when data was combined for four antibiotic classessimilar to our findings of 0.85 and 0.88 for class-unrestricted phenotypic and genotypic data respectively.Although data was too limited to robustly estimate Lin's CCC for individual species and AMR, variability in the level of discordance of wastewaterhuman comparisons (Fig. 3A, 3Bright panels) and overlap in 95% CI around point estimates (Fig. S7-23) is likely attributable in part to specific species and AMR mechanism that are optimally suited to wastewater-based AMR surveillance.This phenomenon was also reported in several studies without extractable prevalence data where specific AMR classes/genes exhibited notably higher/lower wastewaterhuman concordance.
Genotypic AMR detection methods showed a slight performance increase (non-significant) for both Lin's CCC and in the reduced proportion of strongly discordant comparisons (Fig. 3B right panels) which may reflect their relatively species-and mechanism-agnostic nature (mostly WGS in extracted data) over phenotypic methods which may be more susceptible to variations from differing growth media/conditions and interpretation of resistance breakpoints.Problems with accurately characterising AMR prevalence when only small numbers of isolates are analysed (median of 94-130 for human and 91-98 for wastewater compartments across pheno-/genotypic comparisons here) is also a concern highlighted by previous researchers, particularly when few resistant isolates are available (Huijbers et al., 2020).
Although not a focus in our review, genotypic profiling potentially affords some additional advantages over phenotypic analyses, and is relevant to confirming that genetic mechanisms underpinning phenotypes are also similar.Genomic approaches, such as sequencing of isolates or whole sample metagenomics, enable a more agnostic approach to be adopted than for qPCR, in that analyses do not need to be restricted to a subset of predefined genes/gene variants.Genomic data and profiles can also be more readily shared with the wider community to allow for cross-study comparisons and data synthesis; as demonstrated by Karkman et al.Genomic approaches also allow for the evaluation of genetic relatedness and quantitation of either isolates or microbial populations across compartments (e.g. through phylogenetics, taxonomic/strainlevel profiling and strain-based comparisons using metagenomes).Genomic approaches may be higher-resolution and more flexible, but at a higher resource cost; sensitivity for the detection of AMR genes is also dependent on sequencing depth, and accurately associating specific AMR gene markers with strains or species in short-read based metagenomes remains difficult (Gweon et al., 2019).

Effect of study features on wastewater-human AMR concordance
Our review highlights the need for clear guidance on performing and reporting these studies in a more standardised way, with a view to consolidating best-practice approaches in a workflow whilst enabling some flexibility to account for differences in any given setting.Although logistic regression found no significant associations, descriptive synthesis identified study features potentially associated with higher wastewater-human AMR concordance and these are discussed in the context of existing literature here.
WWTW influent is likely the most population-representative wastewater sample for AMR surveillance using either phenotypic or genotypic approaches.This is not unexpected, as previous studies have described transformation of microbial and AMR gene composition during treatment (Tong et al., 2019;Zhang et al., 2020).Transformed samples may remain useful for wastewater-based AMR surveillance, potentially dependent on treatment process; however differing levels of treatment have been shown to select for different species/AMR determinants (Tong et al., 2019).Additionally, a temporospatial overlap of wastewater sampled and the target surveillance population is likely helpful; a feature of the majority of moderate-/high-agreement studies as well as sampling fewer WWTWs which may also relate to the closeness of the populations compared.
Composite sampling also seems sensible as wastewater composition changes significantly over short timescales (Guo et al., 2019) and individual grab samples may be "flooded" by homogenous solid material (Reinthaler et al., 2013).However, grab sampling is convenient and avoids significant autosampler-associated workload and capital costs, and was the most common sampling method used in the studies analysed.Further research is needed into characterising how effectively single timepoint grab samples versus composite/proportional samples reflect temporal AMR changes.
Longitudinal sampling with timeframes over 12 months was most common in both phenotypic and genotypic high agreement studies, consistent with data from two studies which could not be directly synthesised (Hendriksen et al., 2019a;Pignato et al., 2010).In these studies, both used two-weekly sampling intervals over 12-and 3-month timeframes respectively.The former observed an association between ampicillin-resistant wastewater and contemporaneous clinical isolates, whereas the latter found no relationship between contemporaneous public health surveillance and wastewater metagenomic read abundances.

Recommendations regarding risk of bias
Only seven included studies were judged as low risk based on our risk of bias assessment (Table S2); however a sensitivity analysis of wastewater-human AMR prevalence concordance in these data (Fig. S6) showed similar results to the main analysis of all studies.Although high risk of bias does not necessarily mean that the data cannot be considered in a data synthesis, and our concordance sensitivity analysis appears to show this for included studies, minimising the risk of bias is key to producing robust data for future evaluations of wastewater-based AMR surveillance.
Several potential biases may be challenging to manage in the context of wastewater sampling, for example those that arise from logistical reasons concerning inaccessibility of sampling sites or sampling equipment.While unexpected wastewater site closures cannot be anticipated (Pärnänen et al., 2019) and achieving exactly the same sampling methods across highly variable sampling sites is challenging in largescale collaborations (Hendriksen et al., 2019b), future studies should attempt to minimise any differences across wastewater sampling sites where feasible.For example, Huijbers et al. consistently sampled wastewater sites during the mid-week to avoid potential weekend effects known to affect chemical wastewater-based epidemiology, and Urase et al. purposefully avoided sampling combined sewer wastewater sites during rain events to avoid potential dilution and flow variation.Additional metadata such as sample storage conditions or freeze/thaw cycles are also pertinent to interpretation as investigated by Poulsen et al., who similarly suggested detailed reporting of these features.
Future studies should clearly report sewer inputs, including any unique AMR-associated inputs (e.g.hospitals, agricultural sources) that may confound AMR prevalence estimates (Fahrenfeld and Bisceglia, 2016).The importance of specific AMR-associated inputs is likely linked to whether the AMR mechanisms under evaluation are uniquely associated with that specific source, or already widely disseminated in the community.For example, one study (Raven et al., 2019) sampled WWTWs with and without hospital input, and found the most clinicallyprevalent E. coli ESBL gene was ubiquitous in all WWTWs, indicating prior widespread dissemination in the community.Another study (Jakobsen et al., 2008) focussing on E. coli gentamicin resistance in hospital effluent, receiving WWTW influent and domestic-only wastewater, found significantly lower prevalence in domestic-only wastewater compared to hospital effluent and WWTW influent which shared similar prevalence, indicating that the presence of any hospitalassociated wastewater in influent was not representative of community-based estimates.

Limitations
Our study has several limitations.Although we conducted a comprehensive risk of bias assessment, the study pool was too small to feasibly conduct risk-stratified synthesis and meta-regression without substantial loss of power (Higgins and Thompson, 2004).To mitigate this, a modified version of the GRADE system was used to incorporate summary assessments of the quality of the evidence into the interpretation of results (Higgins et al., 2011).Certainty of the evidence base was rated as "low to moderate" which potentially indicates reduced confidence in our conclusions/recommendations, however, our certainty assessment is a conservative estimate omitting two upgrade domains, and true certainty may be higher if data was available to assess these domains.Since most human datasets available in the included studies were clinical in origin, human compartment AMR prevalence estimates were potentially susceptible to biases outlined in our introduction (i.e.overestimation of the "true" population-level AMR burden).However, as seen in existing literature and in the results of this review, clinical and wastewater AMR prevalence estimates do appear to mirror each other.We excluded non-English publications, potentially missing some relevant studies.Studies were highly diverse in reported features, design and outcomes, making a comprehensive synthesis difficult.In particular, many features were poorly characterised and could not be explored in our analyses.For our study feature analysis we focused a priori on features that optimised the identification of an association between wastewater and human AMR prevalence, however it may be that in some circumstances there is genuinely no such association.In many settings globally, established wastewater infrastructures are not available, and an analysis of, for example, WWTW influent may not be feasible; open sewerage systems may represent an alternative sampling point in these contexts.

Conclusion
In conclusion our review suggests that overall, wastewater-based surveillance has significant potential for monitoring population-level AMR, particularly for some species, despite high diversity in study design, methods and metadata.We found that no specific study feature or AMR detection method conferred a clear increase in the ability of a study to detect an association between wastewater and human AMR prevalence.However, based on limited available data, we would recommend that where feasible, genotypic AMR detection, composite sampling of influent with longitudinal timeframe >12 months, and contemporaneous sampling of wastewater and human compartments that are directly associated (i.e. the human population sampled contributes to the wastewater sampled) are used to generate more robust data to better evaluate the strengths and limitations of this approach for surveillance purposes.Clear reporting of study methods and features are essential, and this will facilitate the iterative development of optimal practice guidelines for this emerging surveillance tool.

Contributors
A complete list of author contributions as per CRediT; Contributor Roles Taxonomy is given in the appendix (p23).
Fig.3.AMR in wastewater isolates and human isolates for phenotypic (A) and genotypic (B) comparisons.Left: Concordance plot of AMR prevalence in wastewater and human isolates stratified by AMR detection approach (i.e.phenotypic (A) versus genotypic (B) approaches).Each point represents a single wastewater-human comparison conducted where colour corresponds to bacterial species tested and shape corresponds to human sample type used.Lin's concordance correlation coefficient (CCC) is labelled with 95% confidence intervals.Unbroken line of y = x is plotted as perfect concordance between wastewater and human resistance.Dashed lines of y = x + 0.1 and y = x-0.1 represent high concordance, i.e. ±10% from perfect concordance respectively.Right: Individual wastewater-human comparisons tallied by level of discordance (<5% and 5-10% coloured in green, 15-20% and >20% coloured in purple) between compared wastewater and human AMR prevalence estimates, and plotted to show number of comparisons at each level of discordance, stratified by the target species and antibiotic class (3A-right) or AMR gene family (3B-right).