Coupled virus-bacteria interactions and ecosystem function in an engineered microbial system

Viruses are thought to control bacterial abundance, affect community composition and influence ecosystem function in natural environments. Yet their dynamics have seldom been studied in engineered systems, or indeed in any system, for long periods of time. We measured virus abundance in a full-scale activated sludge plant every week for two years. Total bacteria and ammonia oxidising bacteria (AOB) abundances, bacterial community profiles, and a suite of environmental and operational parameters were also monitored. Mixed liquor virus abundance fluctuated over an order of magnitude (3.18 108 e3.41 109 virus's mL 1) and that variation was statistically significantly associated with total bacterial and AOB abundance, community composition, and effluent concentrations of COD and NH4þN and thus system function. This suggests viruses play a far more important role in the dynamics of activated sludge systems than previously realised and could be one of the key factors controlling bacterial abundance, community structure and functional stability and may cause reactors to fail. These findings are based on statistical associations, not mechanistic models. Nevertheless, viral associations with abiotic factors, such as pH, make physical sense, giving credence to these findings and highlighting the role that physical factors play in virus ecology. Further work is needed to identify and quantify specific bacteriophage and their hosts to enable us to develop mechanistic models of the ecology of viruses in wastewater treatment systems. However, since we have shown that viruses can be related to effluent quality and virus quantification is simple and cheap, practitioners would probably benefit from quantifying viruses now. © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Viruses are agents of mortality (Wommack and Colwell, 2000), nutrient regeneration (Middelboe and Jorgensen, 2006;Haaber and Middelboe, 2009;Shelford et al., 2012) and horizontal gene transfer (Lindell et al., 2004;Sullivan et al., 2006) and therefore key drivers of bacterial abundance, activity and community composition in natural environments, as well ecosystem function (Rodriguez-Valera et al., 2009;Winter et al., 2010;Breitbart, 2012;Liu et al., 2015a). Yet their dynamics have seldom been monitored in activated sludge or indeed any other engineered microbial ecosystem.
Recent advances in molecular methods and the adoption of ecological approaches to engineered systems have shed some light on the complex mechanisms driving the microbial communities in, and the performance and functional stability of, activated sludge systems. Deterministic (such as reactor design, process configuration, operational and environmental conditions) and stochastic processes (microbial birth, death, and immigration), are thought to be at work (van der Gast et al., 2008;Wells et al., 2009Wells et al., , 2011Ayarza et al., 2010;Ofiteru et al., 2010;Ayarza and Erijman, 2011;Valentín-Vargas et al., 2012). Yet, neither completely explain the variation seen in the microbial composition and performance of such systems (Ofiteru et al., 2010).
Given the prominent role of viruses in the microbial ecology of natural environments viral infection could be another important mechanism, especially considering its connection with host abundance fluctuations and functional instability Barr et al., 2010;Motlagh et al., 2015), evidence of predator-prey type oscillations Otawa et al., 2007) and the sheer abundance of viruses Wu and Liu, 2009;Brown et al., 2015) within activated sludge systems. To date however, viral dynamics have been largely overlooked.
The absence of methods to link viruses to their hosts and couple their abundances (Dang and Sullivan, 2014;Brum and Sullivan, 2015) is undoubtedly a contributing factor. However, even total abundance methods, which sparked a transformation in virus ecology in the 1980s (Bergh et al., 1989) and 1990s (Hara et al., 1991;Marie et al., 1999), have only found limited application in activated sludge systems: being used to compare different activated sludge plants (Wu and Liu, 2009;Brown et al., 2015), or across very modest time scales . This is in contrast to the numerous studies that have yielded great insights into, and underpin our understanding of, virus ecology in marine (e.g. Jiang and Paul (1994), Weinbauer et al. (1995), Bratbak et al. (1996) and Li and Dickie (2001)) and freshwater environments (e.g. Hennes and Simon (1995), Hofer and Sommaruga (2001), Bettarel et al. (2004) and Jacquet et al. (2005)). Although even here the spatial and temporal scale of such studies are typically modest; a consequence of limited "ship time" and other logistical constraints. Thus multiyear studies at functionally relevant temporal scales, and which incorporate time varying exogenous factors, have recently been called for in viral ecology (Breitbart, 2012;Brum and Sullivan, 2015;Wigington et al., 2016).
To this end we used a recently adapted flow cytometry (FCM) method (Brown et al., 2015) to measure virus abundance weekly for two years in a full scale nitrifying activated sludge plant. The sampling frequency and duration was considered logistically feasible and biologically and functionally relevant considering nitrifying activated sludge systems maintain solid retention times (SRT) > 7 days (the average period of time biomass remains within a system, Tchobanoglous et al. (2003)). The relationship between virus abundance and the dynamics of total and ammonia oxidising bacteria (AOB), as well as community structure, was evaluated, and the influence of exogenous factors (environmental and operational parameters) was also examined. The role of viruses in the microbial ecology, and performance (removal of COD and NH 4 þ -N) of activated sludge systems was established.

Sample collection
Mixed liquor (ML) grab samples were collected from the aeration basin (3600 m 3 ) of a conventional nitrifying domestic wastewater treatment plant (WWTP, 6751 m 3 day À1 ), situated in the North East of England, United Kingdom, on a weekly basis for a period of two years from June 2011 to May 2013 (104 weeks). Samples were collected in 50 mL polypropylene containers and transported to the lab on ice for immediate processing. Concurrent primary settled sewage (influent) and effluent samples were also collected, in addition to a number of operational variables.

Analytical methods
For all samples (influent, ML and effluent) suspended/volatile suspended solids (SS/VSS), soluble chemical oxygen demand (COD s ) and soluble ammonium (NH 4 þ eN) were determined according to Standard Methods (APHA, 1998) and using Merck COD and NH 4 þ eN test kits (VWR, UK) respectively. Anion concentrations, including nitrate, nitrite, sulphate and phosphate, were determined using high performance Ion Chromatography (Dionex ICS-1000 with AS40 auto sampler); samples were filtered through a 0.2 mm polyethersulfone membrane prior to analysis. Influent trace metals, including cadmium, zinc, lead and copper, were measured by inductively coupled plasma optical emission spectroscopy (ICP-OES) (Vista MPX axial ICP-OES, Varian, UK), as described by Martin et al. (1994). Samples were acidified on collection to pH < 2, digested and then filtered through a 0.45 mm polyethersulfone membrane prior to analysis. Finally temperature, dissolved oxygen (DO) and pH within the aeration basin were measured in real time using in situ probes, influent flow rate was determined using a Parshall flume, and sludge age was obtained from plant operators.

Flow cytometry
For virus enumeration 1 mL sub-samples of influent, ML and effluent were taken, transferred into 2 mL cryovials and fixed at a final concentration of 0.5% glutaraldehyde for 15e30 min at 4 C in the dark. Samples were then flash frozen in liquid nitrogen and stored at À80 C. After defrosting, samples were pre-treated and analysed in triplicate as described by Brown et al. (2015) using a FACScan flow cytometer (Becton Dickinson, USA) equipped with a 15-mW 488-nm air-cooled argon-ion laser and a standard filter setup.
Using this FCM method not all events detected are necessarily viruses, membrane-derived vesicles and gene transfer agents (nonvirus particles) could simultaneously be detected and quantified (Forterre et al., 2013;Biller et al., 2017). Hence the term virus like particle (VLP) being coined and used within marine viral ecology. However, in an earlier study (Brown et al., 2015) regression analysis of FCM and transmission electron microscopy virus counts yielded a highly significant, positive correlation coefficient (P < 0.05), a high coefficient of determination (R 2 ¼ 0.77) and an intercept and slope coefficient statistically indistinguishable from 0 and 1 (not explicitly stated in Brown et al. (2015)) respectively, thus robust statistical evidence that non-virus particles affect FCM virus counts in activated sludge is lacking. Indeed, Biller et al. (2017) concluded that non-virus particles are unlikely to systematically inflate measurements to a notable degree. It is however conceivable, given the R 2 of 0.77 in Brown et al. (2015), that non-virus like particles contribute to random measurement error. Thus in the absence of a systematic measurement error we have elected to use the simpler term virus over VLP.

DNA extraction
DNA was extracted from 250 mL of ML and from 15 mL of influent, the latter being centrifuged at 3392Âg for 15 min and the supernatant removed down to a working volume of 250 mL. Cell wall disruption was then carried out using the FastDNA SPIN Kit for soil (MP Biomedicals, USA), thus 244.5 mL of sodium phosphate buffer and 30.5 mL of MT buffer was added to samples and the mixture transferred to Lysing Matrix E tubes. Samples were then lysed at 6.5 ms À1 for 30 s in a FastPrep instrument (MP Biomedicals, USA) and centrifuged at 14000Âg for 15 min. DNA from 250 mL of the supernatant was then purified using a MagNA Pure LC 2.0 (Roche, UK) and the MagNA Pure LC DNA Isolation Kit S.

Illumina sequencing
Sample preparation for Illumina sequencing generally followed the protocol of Caporaso et al. (2012). Thus the V4 region of the bacterial 16S rRNA gene was amplified using primers 515F (aat gatacggcgaccaccgagatctacactatggtattgtGTGCCAGCMGCCGCGGTAA; adapter, primer pad and primer linker in small letters; and specific primer sequence in capital letters) and 806R (caagcagaagacg gcatacgagatbarcodeagtcagtcagccGGACTACHVGGGTWTCTAAT), the latter was barcoded with a 12-base error-correcting Golay code to facilitate sample multiplexing. Each sample was amplified in duplicate, pooled and then cleaned using a MinElute 96 UF Purification Kit as per the manufacturer's instructions (Qiagen Ltd., West Sussex, UK). PCR reactions consisted of: 2 ml DNA extract, 20 ml 5 Prime Hot Master Mix (VWR, Lutterworth, UK), 1 ml each of forward and reverse primer (10 mM final concentration), and 26 ml molecular-grade water. Reactions were denatured at 94 C for 3 min, with amplification proceeding for 35 cycles at 94 C for 45 s (further denaturing), 50 C for 60 s (annealing) and 72 C for 90 s (extension); a final extension at 72 C for 10 min was added to ensure complete amplification. The concentration and purity of pooled amplicons was assessed using a Quant-iT PicoGreen dsDNA assay kit (Life Technologies, Paisley, UK). A composite sample for sequencing was created by combining all samples in equimolar amounts. The composite sample was cleaned twice using Agencourt AMPure XP beads (Beckman Coulter, High Wycombe, UK) according to the manufacturer's instructions, with fragment selection undertaken using E-Gel 2% agarose gels (Life Technologies, Paisley, UK). The size-selected fragment was then purified using the QIAquick PCR purification kit (Qiagen Ltd., West Sussex, UK) and quantified using the Quant-iT PicoGreen dsDNA assay kit, to ensure the sample contained enough DNA for sequencing analysis. Sequencing was carried out on the Illumina MiSeq personal sequencer at the Centre for Genomic Research, University of Liverpool. A total of 9.5 million reads were obtained.
Raw reads were processed using the DADA2 (v. 1.4, Callahan et al. (2016)) pipeline, specifically following the workflow for Big Data (https://benjjneb.github.io/dada2/bigdata.html), and using R version 3.4.0, R Core Team (2017)). Forward and reverse read pairs were trimmed and filtered (minimum length 200 nucleotides, EEmax < 2 expected errors). Amplicon sequence variants (ASV's) were independently inferred in each sample from forward and reverse reads using the run-specific error rates, and then joined using the~mergePairs~function. Chimeric ASVs were inferred and identified using~removeBimeraDenovo~and removed. This resulted in 2756 final ASVs (per sample average ¼ 20750, min ¼ 8708 and max ¼ 63995) which were taxonomically classified against the SILVA database (v.128, Quast et al. (2013)) using the DADA2 implementation of the RDP's naive Bayesian classifier (Wang et al., 2007). Note AVS's are denoted as operational taxonomic units (OTU's).
2.3.3.1. qPCR. Quantification of total bacteria and AOB was carried out using qPCR and amplification of the 16S rRNA gene and the ammonia monooxygenase (amoA) gene respectively. Samples were amplified in triplicate on a CFX96 Real-Time PCR Detection System (Bio-Rad, UK) using the primer sets 338F (Muyzer et al., 1993) and 1046R (Huber et al., 2007) for total bacteria and amoA-1F* (Stephen et al., 1999) and amoA-2R (Rotthauwe et al., 1997) for AOB. qPCR reactions contained 3 mL of template DNA (sample DNA, standard DNA or molecular grade water (negative control)), 0.5 mL of forward and reverse primer (10 rmoles per mL), 5 ml of SsoFast EvaGreen supermix (Bio-Rad, UK) and 1 ml of molecular-grade water. Reaction conditions were: 1 cycle at 98 C for 3 min, followed by 40 cycles consisting of 98 C for 5 s and 60 C (338F/1046R) or 56 C (amoA-1F/amoA-2R) for 5 s. Purified circular plasmids containing the target gene were used as standards and run in triplicate for each qPCR reaction. Efficiencies for all qPCR reactions ranged between 90 and 110% and had a R 2 ! 0.99. Gene copy numbers per unit volume were converted to cell numbers per unit volume using accompanying sequence data for the 16S rRNA gene (described in SI) and assuming each AOB cell contained 2 copies of the amoA gene (McTavish et al., 1993;Norton et al., 2002).

Statistical analysis
All statistical analysis, unless otherwise stated, was undertaken in RStudio (v. 1.0.143, R Core Team (2017)) using R version 3.4.0 (R Core Team, 2017).

Virus e biotic/abiotic interactions
Virus e biotic/abiotic interactions were assessed by multivariate generalised least squares (GLS) regression (gls, "nlme" v. 3.1e131, Pinherio et al. (2017)), with models describing mixed liquor virus, total bacteria and AOB abundance produced. All measured biotic and abiotic parameters, unless otherwise stated (see footnotes of Table S2 e S4), were initially used as covariates (K i ¼~59, n ¼ 102), with multivariate ordinary least squares (OLS) regression (lm, "stats" v. 3.4.0, R Core Team (2017)) followed by bidirectional elimination (stepAIC, "MASS" v. 7.3e47, Venables et al. (2002)) based on Bayesian information criterion (BIC) used for model simplification and selection. OLS models were then rerun as GLS to allow incorporation of a correlation structure (corCAR1(form ¼W eek), accounting for slight variations in sampling frequency (approximately weekly). To guarantee parsimony and adherence to the assumptions of linear regression manual backward elimination and forward selection based on BIC was subsequently undertaken on all GLS models, which were fit by maximum likelihood.
Models were checked visually for linearity ( Fig (2015)) additionally used to confirm the latter. Collinearity amongst explanatory variables was assessed by variance inflation factor's (VIF's, vif, "car" v. 2.1e4, Fox and Weisberg (2011)), with variables contributing to VIF's > 3 removed based on statistical significance (P-value) until all fell below this threshold (Zuur et al., 2010). To aid in adherence to these assumptions variables were either transformed (log 10 for virus, total bacteria and AOB abundances (biotic) and natural log for COD, NH 4 þ -N, nitrate, nitrite and phosphate terms (abiotic)) or standardised to mean 0 variance 1 (all other environmental and operational parameters (abiotic)). Finally ANOVA (anova, "stats" v. v. 3.4.0, R Core Team (2017)) was used to test the statistical significance of each model's correlation structure, whilst a pseudo-R 2 was calculated to ascertain a representation of the variance in mixed liquor abundances explained by each of the models respectively (r.squaredLR, "MuMin" v. 1.15.6, Barton (2016)).

Virus e community structure interactions
Canonical correspondence analysis (CCA) was performed to assess the response of the bacterial communities to abiotic/biotic conditions (cca, "vegan" v. 2.4e3, Oksanen et al. (2018)), with all measured biotic and abiotic parameters, transformed as in 2.4.1, initially used as explanatory variables (K i ¼~59, n ¼ 102) unless otherwise stated (see caption of Fig. 3). Automated bidirectional selection, based on Akaike information criterion (AIC) and Monte Carlo permutation tests (999 permutations), was then used for model simplification and thus identification of the most significant explanatory variables (step, test ¼ "perm", "stats" v. v. 3.4.0, R Core Team (2017)). Collinearity was assessed by VIF's (vif.cca, "vegan" v. 2.4e3, Oksanen et al. (2018)), with variables contributing to VIF's > 3 being removed as in 2.4.1 (Zuur et al., 2010). Finally the statistical significance of the CCA model (constrained components), its axes and the marginal effects of each explanatory variable were assessed using permutation tests (999 permutations, anova.cca, "vegan" v. 2.4e3, Oksanen et al. (2018)). Note CCA was chosen over redundancy analysis as unimodal approaches are better suited to relative abundances and the presence of zeros (Ramette, 2007).
Local similarity analysis (LSA, Ruan et al. (2006)) was utilised to observe correlations between the 50 most abundant OTUs (relative abundances) and mixed liquor virus abundance. Analysis was undertaken using eLSA v 1.0.2 (Xia et al., 2011(Xia et al., , 2013 in Python v 2.7. A maximum time delay of three was utilised (delay-Limit ¼ 3), P-values were calculated by permutation tests (1000, P-valueMethod ¼ perm), the required precision of P-values was set at 1/1000 (precision ¼ 1000) and data was rank-normalised and z-transformed (normMethod ¼ robustZ, Ruan et al., 2006;Xia et al., 2011Xia et al., , 2013. Multiple hypothesis correction was undertaken using Q-values (Storey and Tibshirani, 2003). The LSA output was then visualised as an association network in Cytoscape v 3.6.0 (Shannon et al., 2003), only correlations with a Pvalue 0.05, a LSA score (LS) ! 0.3 and a Q-value 0.01 were examined. Calculation of the peak (max relative abundance) to average (mean relative abundance) ratio (PAR) for examined OTUs allowed assessment of their persistence within the mixed liquor, three arbitrarily defined ecological categories facilitated this process; persistent (PAR 5), intermittent (PAR > 5 < 10) and transient (PAR ! 10).
The influence of virus abundance on alpha diversity was assessed by calculation of Spearman's rank correlation coefficients (cor.test, "stats" v. 3.4.0, R Core Team, 2017) between mixed liquor and effluent virus abundance and D 1 (exponential of Shannon diversity) and D 2 (inverse of Simpson diversity) Hills diversities (diversity, "vegan" v. 2.4e3, Oksanen et al., 2018), which better represent rare and common taxa respectively (Vuono et al., 2015). Its use, over Pearson correlation, was justified since all variable combinations, transformed as in 2.4.1, were not bivariate normal (roystonTest, "MVN" v. 4.0.2, Korkmaz et al. (2014)). Bonferroni corrections were applied to all calculated correlations.

Virus e community function interactions
Virus-community function interactions were assessed by calculation of Spearman's rank correlation coefficients between mixed liquor and effluent virus abundance and mixed liquor and effluent COD and NH 4 þ eN concentrations respectively, performed, justified and corrected as in 2.4.1. Structured equation modelling (SEM) was also conducted to infer hypothesised causal links between virus abundance and effluent COD and NH 4 þ eN concentrations (sem, "lavaan" v. 0.5e23.1097, Rosseel (2012)). A priori models were constructed based on literature and theory, improved upon using statistical associations identified by regression analysis (2.4.1) and, to guarantee parsimony, simplified by removing non-significant indicators and pathways (P < 0.1).

Bioreactor performance and abiotic conditions
Influent flow rates were highly variable over the two-year study period (Table S1 and Fig. S1), leading to fluctuating hydraulic and solids retention times (10.6 ± 3.1 h and 11.1 ± 3.1 days respectively, Fig. S1) and reactor biomass concentrations (2.7 ± 0.5 g L À1 MLSS and 2.01 ± 0.4 g L À1 MLVSS, Fig. S2). Reactor temperature varied on a seasonal basis, with a summer maximum and winter minimum of 17.9 C (August 2012) and 6.3 C (January 2013) recorded respectively (Fig. S1). DO concentrations were moderately variable (2.5 ± 0.8 mg L À1 ) and pH was maintained within a narrow range (6.6 ± 0.2) through flow dependent dosing of sodium hydroxide (Fig. S1).
The discharge consent, a legislated maximum final effluent concentration, of COD s (125 mg L À1 ) was achieved 96.1% of the time, thus mean removal efficiency (82.1 ± 18.1%) and effluent concentrations (32 ± 28.5 mg L À1 ) were relatively stable (Fig. S2 B). Nitrification was less efficient and more unstable (mean NH 4 þ eN removal of 75.8 ± 27.6%); accordingly effluent NH 4 þ eN and nitrite concentrations were highly variable (0e76.4 and 0e5.3 mg L À1 ) and the discharge consent (5 mg L À1 ammonia) was adhered to only 57.8% of the time (Fig. S2). Certainly nitrite accumulation, thus incomplete nitrification, was periodically evident, particularly from day 500 onwards (Fig. S2 E). Of the trace metals monitored in the influent calcium, potassium and magnesium were present at the highest concentrations (mean values of 47.4 ± 8.7, 13.1 ± 3.1 and 11 ± 2.8 mg L À1 respectively), whilst cadmium and arsenic were present at the lowest concentrations (mean values of 3.9 ± 11.3 and 9.3 ± 7.9 mg L À1 respectively, Fig. S3). All monitored trace metals, however, were routinely present in the influent (Table S1 and Fig. S3).

Virus interactions with biotic and abiotic conditions
Fitted values from the mixed liquor virus GLS regression model agreed well with observed abundances (Fig. 2 A and Fig. S4 A), with explanatory variables explaining 83% of abundance variations. The model identified strong positive associations with influent virus abundance (P < 0.05), influent NH 4 þ -N (P < 0.01), phosphate (P < 0.01) and sulphur (P < 0.001) concentrations, mixed liquor AOB abundance (P < 0.05) and mixed liquor pH (P < 0.05). In contrast influent magnesium concentrations (P < 0.001) and mixed liquor nitrate (P < 0.001), nitrite (P < 0.01) and sulphate (P < 0.001)  and AOB (C) abundance from the full scale WWTP (observed) and respective multivariate GLS regression models (fitted). Shaded area represents 95% confidence intervals for the regression models. n ¼ 102 (A and B) and 95 (C) respectively, the latter due to missingness in influent AOB abundance.
concentrations were highly negatively associated (Table S2) with total virus numbers. However, the fits of the mixed liquor total bacteria and AOB GLS regression models, when compared to that produced for mixed liquor viruses, were poorer (Fig. 2, Fig S5 A and S6 A). The models explained only 53% and 47% of the variation in the bacterial and AOB abundances respectively. However mixed liquor virus abundances had strong positive associations with both bacteria and AOB concentrations (P < 0.01 and P < 0.001 respectively Tables S3 and S4). The AOB were even positively correlated with viruses in the effluent (P < 0.01). Other explanatory variables significantly contributing to each model are summarised in the supplementary information.

Virus interactions with bacteria community structure
The CCA ordination accounted for the majority of observed variance in OTU-environment associations (77.6%, Table S5 and Fig. 3). Mixed liquor virus abundance (P < 0.001) was the third most important variable after water temperature, P < 0.001, and influent potassium, P < 0.001 ( Fig. 3 and Table S5).
Mixed liquor virus abundance was found to have a significant (LS > 0.3, P 0.05 and Q 0.01) and complex association with 6 of the 10 most abundant OTUs and a further 11 of the 50 most abundant taxa (Fig. 4). A group of 5 taxa had a significant association with the virus concentration at the time of sampling (both negative, OTUs 8 and 24, and positive OTUs 5, 9 and 26). This group of OTUs were highly interconnected, OTUs 5, 9 and 26 being positively associated and OTU pairs 8 and 5, 8 and 9, 8 and 26 and 24 and 9 being negatively associated (Fig. 4), all associations were nonlagged. All 5 were persistent in the mixed liquor (PAR 5, Fig. 4) and included the "workhorse" genus Zoogloea (OTU 5).
However, other taxa had associations with a time delay, varying from 1 week (OTU 11 positive and OTUs 33, 37 and 44 negative), 2 weeks (positive, OTUs 2 and 50 and negative, OTUs 5, 34 and 46) Fig. 3. Canonical correspondence analysis ordination of temporal mixed liquor bacterial community dynamics from the full scale WWTP and associated virus and abiotic variables, axes 1 and 2 (A) and 3 and 4 (B) respectively. Points and associated numbers represent individual samples, coloured and sized based on the month and year of collection and the samples virus concentration mL À1 respectively. Proximity of samples indicates similarities in community composition. Arrows indicate the direction of increase in each explanatory variable, their length indicates the strength of correlation with each axis and the angle between arrows indicates the approximate degree to which explanatory variables are correlated. Inf ¼ influent, ML ¼ mixed liquor and Eff ¼ effluent. Fig. 4. Network visualisation of local similarity analysis associations between all OTUs from the top 50 most abundant that were associated with mixed liquor virus abundance (LS ! 0.3, P 0.05 and Q 0.01) at the full scale WWTP. Circular nodes indicate bacterial taxa, coloured and sized based on the OTU's PAR ( 5 ¼ white, > 5 < 10 ¼ light blue and !10 ¼ dark blue) and mean relative abundance respectively. For reference OTU 2 and OTU 50 had mean relative abundances of 4.68%, the highest, and 0.36%, the lowest. The diamond shaped node indicates mixed liquor virus abundance. Solid and dashed lines indicate positive and negative associations respectively, whilst a lines colour indicates the correlations delay (teal ¼ 0, purple ¼ 1, yellow ¼ 2 and green ¼ 3). Arrows point to the delayed, or trailing, node in delayed associations. ML ¼ mixed liquor. and even 3 weeks (positive, OTU 19 and negative, OTUs 4 and 22). These OTUs were found intermittently (PAR > 5 < 10) or transiently (PAR ! 10) and included well known filamentous organisms, Microthrix (OTUs 2, 11 and 37), Thiothrix (OTU 50) and the phosphorous removing genus Tetrasphaera (OTU 4). Mixed liquor virus abundance was not statistically associated with either Hill's diversity indices (Fig. 5 A).

Virus interactions with community function
Mixed liquor and effluent virus abundance were highly positively correlated with both mixed liquor COD (P < 0.001 and P < 0.01) and NH 4 þ -N (P < 0.001 and P < 0.05) concentrations, and weakly positively correlated with effluent NH 4 þ -N concentrations (P < 0.1 and P < 0.1, Fig. 5 B).
The SEM also identified significant positive associations, although non-directional, between mixed liquor virus abundance and both mixed liquor NH 4 þ -N (P < 0.05, Fig. 6 A) and COD (P < 0.01, Fig. 6 B) concentrations, which were directly influencing, and being influenced by, corresponding effluent (P < 0.001 and P < 0.01, Fig. 6) and influent concentrations (P < 0.001 and P < 0.05, Fig. 6) respectively. Effluent virus abundance was also none directionally positively associated with effluent NH 4 þ -N (P < 0.05, Fig. 6 A) and COD (P < 0.01, Fig. 6 B) concentrations, whilst being directly positively influenced by mixed liquor virus abundance (P < 0.001 and P < 0.001, Fig. 6). Mixed liquor virus abundance (P < 0.001, Fig. 6 A), as well as mixed liquor AOB abundance (P < 0.1, Fig. 6 A), was also directly positively influenced by influent concentrations of NH 4 þ -N. No significant associations were found between mixed liquor AOB/ bacteria abundance and mixed liquor and effluent NH 4 þ -N/COD concentrations and virus abundance (Fig. 6), the only exception being the negative association found between mixed liquor AOB and effluent virus abundance (P < 0.01, Fig. 6 A). Overall the SEM analysis was able to explain 28% and 12% of effluent NH 4 þ -N and COD concentrations respectively.

Discussion
This work provides evidence that viruses play an important role in the ecology and function of activated sludge plants: the statistical associations imply they interact with the abundance and dynamics of key bacterial groups to such an extent that community function, and therefore effluent quality is affected. It seems plausible that viruses could cause outright failure either directly, affecting key functional groups, or indirectly by influencing filamentous bacteria and thus foaming and bulking.
Such a prominent role for bacteriophage is consistent with the high virus to bacteria ratio we observed, evidence from other engineered systems (Hantula et al., 1991;Lee et al., 2007;Barr et al., 2010;Shapiro et al., 2010) and knowledge of their role in the ecology of marine and freshwater environments (Suttle, 2007;Rohwer and Thurber, 2009;Breitbart, 2012), where viruses are thought to kill~50% of bacteria every day (Noble and Fuhrman, 2000;Ram et al., 2013) and therefore influence nutrient cycling and community function.
Inferring the role of viruses in the environment is however problematic; we are only able to quantify total virus numbers and the abundance or proportional abundance of relatively broad bacterial groups. Moreover, the virus community will comprise numerous species/strains (Breitbart et al., 2007), each with varying host ranges (Khan et al., 2002a(Khan et al., , 2002b) and most will likely be in unsynchronised predator prey cycles with their hosts. In addition, the relationship between phage and host may be non-linear and lagged.
Thus in the absence of more detailed measurements of viruses and their hosts we utilised multiple statistical approaches to identify significant relationships between total virus numbers and various measures of biomass, community structure and function. This is a crude and probably insensitive approach. However, if we can find significant relationships with relatively weak tests, this does imply that viruses play a very strong role in this system. Likewise, though the underlying basis of the observed correlations is uncertain, and could be indirect, they could provide the impetus for the more detailed and challenging work required to obtain a mechanistic understanding of how bacteriophage affect engineered biological systems.
All four lines of enquiry we used suggested a role for viruses (regression and structural equation modelling to determine an impact on biomass and function, effluent quality, and CCA and local similarity analysis to evaluate the effect of viruses on community structure). A detailed consideration of every statistically significant relationship would be tedious and speculative. Nevertheless some  ¼ 102 (A and B). ML ¼ mixed liquor, D 1 ¼ exponential of Shannon diversity and D 2 ¼ inverse of Simpson diversity. P < 0.1, *P < 0.05, **P < 0.01, ***P < 0.001. commentary is warranted.
Viruses emerged as having a significant, and positive correlation with the variation in bacterial and AOB biomass over time. More biomass (prey) meant more viruses (predator), which is plausible. In individual predator prey-cycle's the predator lags the prey but will still be positively correlated. It would appear that when many such cycles are aggregated this correlation is still detectable. The overall fit of the regression models were modest and the mechanistic basis for many of the other statistically significant variables is not immediately apparent. However, some have an obvious physical basis that lends credence to the exercise. For example pH and the divalent cation magnesium were associated with virus numbers: both regulate surface charge and electrostatic interactions and would cause viruses to adsorb/desorb from the biomass and so decrease/increase their abundance (Michen and Graule, 2010;Nap et al., 2014). Likewise viruses in the mixed liquor were influenced by viruses in the influent, despite the latter constituting only a small fraction of the former. The difference between the number of viruses in the influent and the mixed liquor can only be accounted for by the lysis of biomass in the mixed liquor. Yet though the time series of mixed liquor virus abundance could be modelled quite well there was no statistical relationship between the number of viruses and the amount of bacterial biomass. This reinforces the limitations of a purely statistical approach.
In principle, if viruses affect mixed liquor AOB and bacterial biomass, this in turn should affect the effluent COD and ammonia concentration and so the system's function. The structural equation modelling was an attempt to show this chain of causality. It partially succeeded. Though it is clear that viruses affected the effluent concentrations directly and indirectly it was not possible to demonstrate that they did this through a linear effect on bacterial concentrations. However, predation remains the most plausible mechanism by which viruses could affect effluent quality. Indeed key functional groups are known to have signatures of viral infection (prophages and CRISPR sequences) in their genomes (Kunin et al., 2008;Choi et al., 2010;Motlagh et al., 2015).
Viruses appear to have an important, possibly very important, role in community structure being the third most important factor in the canonical correspondence analysis (CCA) after temperature and potassium. The importance of temperature is well established and to some extent obvious (Wells et al., 2009(Wells et al., , 2011, as is the significance of potassium (which constitute about 1% of cell biomass, e.g. Brdjanovic et al. (1996) and Barat et al. (2005)). It follows that the prominent role of viruses is credible though largely ignored in both research and practice and poorly understood in detail. Unfortunately though a CCA can tell us what is important: it cannot tell us why.
Some preliminary insights can be garnered from the local similarity analysis of individual taxonomic groups. Significant associations were found between viruses and some of the most abundant taxa, which is consistent with the Killing the Winner hypothesis (KtW, Thingstad (2000); Winter et al. (2010)). However, given viruses of the most abundant taxa should constitute the bulk of the virus count such relationships are perhaps expected. Thus it is in some ways more remarkable that total viruses were seen to correlate with Microthrix, Thiothrix and Tetrasphaera genera, problematic blooming filamentous bacteria (Blackall et al., 1996;Martins et al., 2004;Guo and Zhang, 2012) that cause foaming and bulking and whose abundance is a matter of great interest and importance to practitioners. A more prosaic explanation could be that some deterministic factor, for example pH, could be affecting both viruses and these problematic bacteria, leading to spurious correlations. However, our observations are consistent with findings elsewhere (Kotay et al., 2011;Petrovski et al., 2012;Khairnar et al., 2014;Liu et al., 2015b). The time lag in some of the relationships we observe probably reflect the asynchrony in the underlying dynamics.

Conclusion
We conclude that this paper provides plausible evidence that bacteriophage are key players in the ecology of activated sludge, affecting not only the dynamics of the biomass and the community structure but the function of the system: carbon and ammonia removal. We posit that bacteriophages cause failure: either directly through effluent concentrations or indirectly through their interactions with filamentous organisms. There seems no reason why this finding will not extend to other functions (especially P removal) and all other biological treatment systems. We caution that all findings are based on statistical associations and their basis is not yet certain. Clearly, deeper mechanistic research is needed to P < 0.1, *P < 0.05, **P < 0.01, ***P < 0.001. Inf ¼ influent, ML ¼ mixed liquor and Eff ¼ effluent.
determine the detailed, species specific, mechanisms. This endeavour will be strengthened by the ongoing revolution in viral metagenomics and single cell analysis (Dang and Sullivan, 2014;Brum and Sullivan, 2015). However, practitioners need not wait for this detailed work to be undertaken. The quantification of bacteriophage is cheap and simple using flow cytometry and total virus counts can, as we have shown, be empirically related to plant performance. We would suggest that operators initiate the use of flow cytometry to count viruses (or VLP) in treatment plants now.