Functional ecology of bacteriophages in the environment

Bacteriophages are as ubiquitous as their bacterial hosts and often more abundant. Understanding how bacteriophages control their bacterial host populations requires a number of different approaches. Bacteriophages can control bacterial populations through lysis, drive evolution of bacterial immunity systems through infection, provide a conduit for horizontal gene transfer and alter host metabolism by carriage of auxiliary metabolic genes. Understanding and quantifying how bacteriophages drive these processes, requires both technological developments to take measurements in situ , and laboratory-based studies to understand mechanisms. Technological advances have allowed quantification of the number of infected cells in situ , revealing far-lower levels than expected. Understanding how observations in laboratory conditions relate to what occurs in the environment, and experimental confirmation of the predicted function of phage genes from observations in environmental omics data, remains challenging


Introduction
At the most basic level, bacteriophages (phages) are simply nucleic acids encased in a protein capsid [1]. However, these small biological entities are machiavellian masters of the microbial world. The sheer abundance of phages means the lysis of their bacterial host has an important role in the ecology of some environments. For example, in the Oceans, where they are estimated to kill 20% of bacteria per day and divert the flow of carbon and other nutrients from higher trophic levels, via the viral shunt (see [2] for review). Phages also have other roles by acting as mediators of lateral gene transfer, driving the evolution of bacterial immune systems, altering host metabolism and shaping bacterial populations through their continual interaction. Within this review, we will discuss recent advances in viromics that are further informing our understanding of phage diversity. We further review recent lab studies on model systems that have pertinent implications for phage-functional ecology, and pose important outstanding questions for phage ecologists.
The study of viruses through viromics, be that from the identification of viruses from bulk metagenomes or enrichment of virions before sequencing, has revolutionised our understanding of phage diversity compared with culture-based approaches [3]. Vastly expanding our understanding of phage diversity and distribution of phages in multiple environments. Long-read sequencing technologies have shown between 22% and 160% more viral Operational Taxonomiv Units (vOTUs) can be detected when long reads are used compared with short reads alone [4-6], thus, we are likely missing much of the diversity of even the most abundant phages in many environments. As the requirement for large quantities of DNA for long-read sequencing is overcome by technological approaches of using virION [6,7] or Multiple displacement amplification [5,8], further phage diversity will likely be uncovered in already well-sampled environments.
While viromics has produced new data, it is culturebased approaches that unveil further biases in viromic datasets. Several culture-based studies have identified the modification of phage DNA makes it recalcitrant to standard DNA library preparation techniques [20][21][22] and thus will be missing from most viromes. How much we are missing is yet to be quantified. In a study that used metatranscriptomic data and DNA metagenomes to identify dsDNA phages, ∼50% of phages were not found in the DNA fraction [9]. What proportion of those phages only detected in the metatranscriptomic were because of DNA modification, preventing detection in DNA samples is unknown. But, it does suggest we could still be missing a large proportion of dsDNA and the use of multiple omic technologies will provide better estimates of the diversity of phages within an environment.
The recent utilisation of transcriptomic datasets [9][10][11] and specific sequencing of virions containing RNA [12,13] has vastly expanded the diversity of RNA viruses, with a doubling in the number of phyla [11]. Of genome-sequenced phage isolates, only ∼0.1% are from RNA phages. To understand the role of RNA phages in the environment, efforts are needed to bring more into culture to develop model systems.

Population control
Despite several decades of work [14], methods to directly assess the rates of stress imposed by phages have been challenging. Phages span from being obligately virulent (i.e. antagonistic), through to temperate, which in many cases may confer a fitness advantage (i.e. mutualistic). For virulent phages, it has been relatively straightforward to enumerate free-phage particles from samples. More challenging has been converting these numbers to mortality rates of susceptible hosts. This requires knowing who infects whom and the number of cells infected at any time. The iPolony method allows direct quantification of the number of infected cells, and thus, deducing of mortality rates [15]. When applied to marine cyanobacteria, these data suggest relatively low levels of infection in situ (< 1% of cells), despite high abundances of free phages within samples. This is a conceptual challenge for phage ecologists to balance apparent infection rates with those predicted from encounter theory. Three mechanisms have been proposed to explain this paradox, namely, that a large fraction of free phage is not infectious, lower rates of adsorption in situ than measured in the laboratory or high rates of resistant cells in the environment.
The relative contributions of these processes to the observed discrepancy are currently unclear. Laboratory experiments show antagonistic coevolution between host and phage is rapid [16], but how quickly these scenarios play out in nature remains poorly resolved. In one study, Pseudomonas fluorescens clones isolated from soil are broadly resistant to phages isolated from the past and similarly 'future' phages are broadly infectious against past bacteria [17]. Thus, at least phenotypically, timescales were over several weeks. Meanwhile, wild strains of Flavobacterium columnare and their phages undergo antagonistic coevolution within a ten-year time series [18]. In both these examples, infectivity and resistance increase linearly over time, indicative of 'armsrace' dynamics. In contrast, time-series infection networks of Vibrio crassostreae and their phages in an oyster farm setting show that phages are 'locally-adapted' in time [19]. That is, they broadly infect host clones isolated from the same time but are less likely to infect hosts from the past or the future. This argues for a 'negative-frequency dependent selection' mechanism operating in situ [16].
With respect to temperate phages, around two-thirds of genomes from isolated cultures have at least one prophage-like signal (Thomas Sicheritz-Ponten pers. comms.). This may depend on environmental contexts, with reports of only 18% of genomes from marine bacteria containing a prophage [20]. Conflicting data from single-amplified genomes from a sample of > 5000 marine bacteria show all genomes contained at least one prophage-like signal [21]. However, it is currently unclear whether these prophage prediction tools are accurate [21,22]. A major focus of the community should be validation of these tools in the lab. Nevertheless, these results pose interesting questions for phage ecologists. For example, why are not all bacteria lysogens? Why are some bacteria more susceptible to prophage than others? Recent work described fitness costs and benefits to prophage carriage in varying environmental contexts. In competition assays with prophage-free strains, lysogens have either a fitness cost or benefit in ordinary laboratory conditions [23]. The fitness peaks then shift depending on certain environmental conditions (e.g. the presence of antibiotics). Moreover, direct effect on fitness also depends on the genes encoded by the prophage in addition to prophage carriage itself [23].
Beyond the laboratory, researchers have sought to correlate metadata with observations of prophage carriage to better understand the evolutionary basis for lysogeny. The Piggyback-the-Winner hypothesis stipulates that a temperate mode is selected for when host population density is high [24]. Correlational analysis supports this theory, yet there is some doubt over the validity of these correlations in providing a mechanistic explanation between cell density and the lysis-lysogeny decision [25]. This mechanism has been explored using modelling approaches [26], which have recently been adapted to suggest a non-linear trend between cell density and frequency of lysogeny [27], whereby lysogeny is promoted at both high and low cell densities. This relationship forms through interactions of cell density, fitness benefits to lysogenisation and energetic status of the bacterial cell.
It is clear that experimental studies on the rates of population control by phages are disparate, have produced contrasting results and have not provided a unifying theory of phage control of bacterial populations. Indeed, one may not exist. However, future experimental approaches should build on the time-series studies and integrate isolation, genomics, estimates of the frequency of infected cells and modelling.

Evolution of bacterial immunity
High-throughput, forward genetic screens are expanding the diversity of described bacterial immune systems (for review see [28]). Yet, perhaps understudied, is the relevance of such systems for phage-host interactions in the environment. That is, can we predict who infects whom from the complement of immune and immune evasion systems? In the case of clustered regularly interspaced short palindromic repeats (CRISPR), decades of research has unmasked the myriad ways to achieve CRISPR immunity (for review see [29]). Thus, there have been many attempts to use the sequence information contained in spacers to predict phage-host relationships [30][31][32]. Yet these attempts are conceptually flawed, not least because the spacer contains a record of past infections, not current ones. More importantly, in recent years, there has been a rapid expansion in the catalogue of ways that phages subvert CRISPR [33][34][35][36]. Interestingly, many of these anti-CRISPR systems are encoded by phages or plasmids [37,38]. This extends to many other phage defense systems. For instance, a functional genetic screen of Escherichia coli strains discovered an overwhelming propensity of novel defense systems within intact prophages and other mobile elements [39]. Therefore, phages may be important in controlling bacterial populations indirectly through phage-phage antagonism.
It is also true that in non-CRISPR systems, phages evolve rapid counter-defence. Phage T4 rapidly evolves segmental amplification mutations to evade the ToxIN toxin-anti-toxin defence system [40]. Interestingly, these amplifications result in deletions elsewhere in the phage chromosome, presumably to maintain packaging fidelity. These deletions happen to be regions encoding other counter-defence systems relevant when infecting other hosts. Thus, this experiment provides the first mechanistic explanation of why arms-races may come at a cost and therefore not continue ad infinitum [16].
Time-series metagenomics of a biological wastewater treatment plant has sought to resolve how bacterial immune systems shape the dynamics of phage-host interactions in situ [31]. Using CRISPR as a model, it is clear that CRISPR spacers are extremely dynamic in time, despite microbial communities remaining relatively constant. This implicates CRISPR in being important for structuring phage-host networks at fine-scale taxonomic resolution (i.e. strain level). Similar conclusions are drawn when attempting to model the temporal behaviour of host and virus immunity [41] and are likely to extend to other defence systems beyond CRISPR.

Active metabolism
The advent of genome sequencing revealed phages could potentially alter the metabolism of their host by the carriage of genes termed auxiliary metabolic genes (AMGs) (see [42] for a review). With increased (meta)genomic data, the number of putative AMGs involved in nitrogen and phosphorus cycles has increased [43][44][45].
Whether all putative phage AMGs are real and not misidentification of phage contigs or assembly artefacts, cannot be fully ruled out until they are observed in phage isolates. Greater confidence in predictions occurs when related AMGs are repeatedly found on predicted phage contigs (containing phage hallmark genes) from multiple environments. Such is the case for AMGs associated with dissimilatory sulphur metabolism, that have been found on ∼200 phage contigs from a variety of environments [46]. Comparison of the abundance of phage:host transcription ratios of these key AMGs, demonstrated they are also active and expressed at high levels [46]. While transcriptomics clearly suggests these genes are important, without cultured phage-host systems, it is difficult to decipher the mechanism or consequence of phages altering host sulphur metabolism.
The very nature of metagenomics and the limited number of phages in cultures, means testing the function of most AMGs remains elusive. However, the use of heterologous expression systems can be used to infer the function of AMGs. One such example is the expression of a phage-encoded CAZyme that cleaves the β-1,4linked mannose units in galactomannan and glucomannan [47]. Suggesting phages may contribute to the breakdown of plant-derived polymers. While the function of one phage CAZyme has been confirmed, a plethora of different CAZyme types have been identified in a range of environments, including mangroves [48], and agroecosystems used for rice cultivation [49]. With both the abundance and type of phage-encoded CAZymes differing between bulk soil and the rhizosphere, and the type of cropping system used for cultivation of rice [49,50]. However, if the CAZymes are all functional and how much they contribute to the breakdown of complex carbon and release simple sugars into the environment as predicted remains unknown. Without cultured phagehost systems, the prediction of such contributions to the carbon cycle remains difficult to quantify.
Even when phage-host systems are available, it is not a simple task to determine the functional role of AMGs. The study of photosynthetic genes found in marine cyanophages, exemplifies how confirmation of prediction function is crucial. With the discovery of psbAD in cyanophages, it was hypothesised that cyanophages carry these genes to allow a repair cycle to operate in PSII, maintaining host photosynthetic activity and accompanying oxygen evolution [51]. Utilising a model phagehost system found the proposed hypothesis is only partly true. While phage versions of the proteins D1 and D2 are produced and the photosynthetic electron transport function of the host is maintained [52][53][54]. Unexpectedly, the fixation of CO 2 was halted during cyanophage infection, decoupling the light and dark reactions of photosynthesis [54]. As a consequence of this decoupling, cyanophages are estimated to result in reduced CO 2 fixation on a global scale of between 0.02 and 5.39 Pg C per year [54].

Lateral gene transfer
Phages play an important role in the horizontal movement of genes between bacterial taxa. Through transduction, they are thought to alter host cell function and therefore evolutionary trajectories. In addition to traditional modes of transduction (generalised and specialised), a recent distinct mechanism has been identified in phages of Staphylococcus and Salmonella [55,56]. The key difference here is the timing of events post induction with replication occurring before excision. Importantly, this leads to the packaging of large (hundreds of kilobases) portions of the host chromosome. Notably, rates of lateral transduction are thought to exceed those by other classical mechanisms (generalised transduction and conjugation) [57]. Thus, this mechanism has been implicated in the formation of bacterial genomic islands [56]. These distinct portions of bacterial chromosomes have disproportionate impacts on acclimation to environmental conditions [58,59], and resistance to phages themselves [60]. Regardless of the mechanism of transduction, understanding rates in the environment has been virtually impossible to measure.

Conclusions
Technological advancements continue to enhance our understanding of the functional ecology of phages. With the iPolony approach allowing for quantification of the number of infected cells [15], such approaches are required in a greater number of studies to gain a broader understanding of how phages contribute to cell lysis. Developments in sequencing technologies and methods have further increased our understanding of phage diversity, in particular RNA phages, and the range of AMG phages may encode. More laboratory model systems are required to confirm the function and understand the mechanism of these AMGs. Current laboratory models continue to produce novel insights into the molecular complexity of phage-host interactions. The next frontier is to integrate sequence data with laboratory studies to provide an improved understanding of processes important for phage ecology. To this end, we provide several outstanding questions that need to be critically addressed by phage ecologists (Figure 1). We believe answering these questions will have broad importance for understanding the role phages play in controlling Earth's biogeochemical cycles, for the spread of bacterial pathogenicity and for the therapeutic use of phages in controlling disease.

Conflict of interest statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
No data were used for the research described in the article. One of the first studies to use HMMs to identify ssRNA phages in assembled metatranscriptomics. Resulting in a 60 fold increase in the number of ssRNA phages.