Phages actively challenge niche communities in the Antarctic soils

By modulating the structure, diversity and trophic outputs of microbial communities, phages play crucial roles in many biomes. In oligotrophic polar deserts, the effects of katabatic winds, constrained nutrients and low water availability are known to limit microbial activity. Although phages may substantially govern trophic interactions in cold deserts, relatively little is known regarding the precise ecological mechanisms. Here, we provide the first evidence of widespread antiphage innate immunity in Antarctic environments using metagenomic sequence data from hypolith communities as model systems. In particular, immunity systems such as DISARM and BREX are shown to be dominant systems in these communities. Additionally, we show a direct correlation between the CRISPR-cas adaptive immunity and the metavirome of hypolith communities, suggesting the existence of dynamic hostphage interactions. In addition to providing the first exploration of immune systems in cold deserts, our results suggest that phages actively challenge niche communities in Antarctic polar deserts. We provide evidence suggesting that the regulatory role played by phages in this system is an important determinant of bacterial host interactions in this environment. Importance In Antarctic environments, the combination of both abiotic and biotic stressors results in simple trophic levels dominated by microbiomes. Although the past two decades have revealed substantial insights regarding the diversity and structure of microbiomes, we lack mechanistic insights regarding community interactions and how phages may affect these. By providing the first evidence of widespread antiphage innate immunity, we shed light on phage-host dynamics in Antarctic niche communities. Our analyses reveal several antiphage defense systems including DISARM and BREX, which appear to dominate in cold desert niche communities. In contrast, our analyses revealed that genes, which encode antiphage adaptive immunity were under-represented in these communities suggesting lower infection frequencies in cold edaphic environments. We propose that by actively challenging niche communities, phages play crucial roles in the diversification of Antarctic communities.


Abstract 24
By modulating the structure, diversity and trophic outputs of microbial communities, 25 phages play crucial roles in many biomes. In oligotrophic polar deserts, the effects of 26 katabatic winds, constrained nutrients and low water availability are known to limit 27 microbial activity. Although phages may substantially govern trophic interactions in 28 cold deserts, relatively little is known regarding the precise ecological mechanisms. 29 Here, we provide the first evidence of widespread antiphage innate immunity in 30 Antarctic environments using metagenomic sequence data from hypolith 31 communities as model systems. In particular, immunity systems such as DISARM 32 and BREX are shown to be dominant systems in these communities. Additionally, we 33 show a direct correlation between the CRISPR-cas adaptive immunity and the 34 metavirome of hypolith communities, suggesting the existence of dynamic host-35 phage interactions. In addition to providing the first exploration of immune systems in 36 cold deserts, our results suggest that phages actively challenge niche communities 37 in Antarctic polar deserts. We provide evidence suggesting that the regulatory role 38 played by phages in this system is an important determinant of bacterial host 39 interactions in this environment. 40 41 Importance 42 In Antarctic environments, the combination of both abiotic and biotic stressors results 43 in simple trophic levels dominated by microbiomes. Although the past two decades 44 have revealed substantial insights regarding the diversity and structure of 45 microbiomes, we lack mechanistic insights regarding community interactions and 46 how phages may affect these. By providing the first evidence of widespread 47 Introduction 58 identified (18,19). These include adaptive immunity elements, such as the CRISPR-83 Cas systems, and innate immunity mechanisms, such as restriction-modification 84 (RM) and toxin-antitoxin abortive infection (Abi) systems (18). Recent pangenomics 85 studies have also identified novel defense systems that are widely distributed across 86 bacterial taxa and are thought to play a role in anti-phage resistance (20-23). These 87 include the bacteriophage exclusion (BREX) system, coded by a 4-8 gene cluster, 88 that provides resistance to Siphoviridae and Myoviridae tailed phages by inhibition of 89 phage DNA replication (21), and other less well characterized systems such as the 90 Thoeris, Shedu and Gabija elements that increase bacterial host resistance to 91 specific groups of phages (22). 92 Combining the valuable evidence on phage diversity and prevalence in polar 93 desert soils, we hypothesize that phage-host interactions play an important role in 94 shaping the structure of edaphic microbial communities in these environments. To 95 test our hypothesis, we assess the known bacterial defense systems in 96 metagenomic sequence data derived from niche Antarctic hypolith community. We 97 were able to link some of these data to specific phage genomes and propose that 98 phages play an active role in shaping the immunity of Antarctic soil microbial 99 communities. 100 101

102
The distribution of anti-phage defense mechanisms shows an abundance of 103 innate immunity genes 104 The distribution of antiphage defense systems in the metagenome was 105 determined by mapping defense genes against the taxonomically assigned contigs. 106 In total, 24,941 defense genes were detected, compromising 1.2% of the entire 107 metagenome gene count. Approximately 40% of these were found in contigs 108 attributed to unknown phyla. The general distribution of defense genes across known 109 phyla was consistent with the relative abundance of each phylum in the metagenome 110 ( Figure 1A, Table S4). Proteobacteria harbored the highest number of anti-phage 111 genes (5289 genes, 1.1% of total gene count for this phylum), followed by 112 Actinobacteria (3808, 0.9% total gene count) and Bacteroidetes (2128, 1.08% of total 113 contig count). RM, DISARM and BREX systems were the most abundant systems in 114 the metagenome, contributing 67.6% of the total gene hits for anti-phage defense 115 systems. On the other side of the spectrum, the defense systems Shedu, Hachiman 116 and CRISPR-type 2 were present at relatively low abundances, and therefore had 117 little apparent contribution to the global defense system distribution. The average 118 contribution of defense genes to the total gene count per phyla was 1.8%, with 119 Deferribacteres and Candidatus Tectomicrobia as outliers. However, it is important 120 to note that these phyla represent a very small portion of the metagenome, and 121 therefore the possibility that the high percentage of defense genes is biased toward 122 the low gene count for these phyla cannot be disregarded. 123 Analysis of the relative contribution of each defense system within each 124 phylum also showed that genes belonging to the RM, DISARM, and BREX systems 125 were the main contributors across the majority of phyla ( Figure 1B) A total of 3758 genes for the DISARM system were identified. These included 142 the Class I marker gene drmD (449 counts, 11.9% of DISARM genes), which 143 encodes the SNF2-like helicase (23), as well as the Class II marker gene drmA 144 (1020, 17.1% of DISARM genes), which encodes a protein with a putative helicase 145 domain (23). Similarly, a total of 4598 genes representing all BREX types were 146 identified in the metagenome. Interestingly, the most abundant gene from this 147 system found in the metagenome, pglW (2640, 57.4% of BREX genes), which codes 148 for a serine/threonine kinase, is specific to the type 2 BREX system, also called the 149 Pgl system (21). By comparison, of the 7908 RM genes found in the metagenome, 150 the most abundant is hsdM (1423, 18% of RM genes), a type I DNA methylase 151 responsible for the protection of host DNA (24). In fact, more than 50% of RM 152 defense genes were attributed to type I RM systems. 153 The third non-canonical system representing more than 10% of the anti-phage 154 defense systems in a subset of the phyla, the Zorya system, included a total of 2411 155 genes in the metagenome. The majority of these were homologous to the two genes 156 that make up a proton channel, zorA and zorB. This is a common feature in all types 157 of Zorya system and is thought to cause depolarization of the membrane upon 158 infection (22). 159 160 Type I CRISPR-Cas genes comprise the bulk of anti-phage adaptive immunity 161 genes 162 In total, 2234 CRISPR-cas genes were identified in 1601 contigs by searching 163 for shared sequence similarities against the CDD database. A substantial proportion 164 of all classified CRISPR-cas loci (71.4%) belonged to type I CRISPR-Cas systems, 165 followed by type III (18.5%) and type II (10.2%) (Table S5). While the abundance of 166 Cas I-B loci sequences in the public databases suggests that the Cas-I mechanism 167 is the most common in both bacteria and archaea (20 and 30% of total CRISPR loci 168 (25), less than 3% of these loci were present in our composite metagenome (Table  169 S5, Figure 2). Surprisingly, CRISPR-cas loci linked to Types I-C and I-E were the 170 most prevalent, at 24.1% and 12.9% of classified CRISPR-cas loci, respectively. 171 Another subtype identified at higher relative abundances than previously reported 172 (25) was I-U, at 10.76% of classified cas loci. This subtype is characterized by the 173 marker GSU0054 domain, which was the fourth most abundant cas CDD overall 174 (108 occurrences) after cas4, cas1, and cas2. 175

Phage presence in the niche community is correlated with the CRISPR arrays 176
CRISPR arrays represent the history of infection by invading DNA (e.g. 177 phages,plasmids (26,27), and a study of their composition and frequencies 178 provides insights into phage-host interactions in an ecological context (28). A total of 179 878 CRISPR arrays harboring 10,292 spacers were identified in the metagenome, 180 with an average length of 36 protospacers per array ( Figure S1A). CRISPR array 181 sizes ranged from 2 to 249, with the majority (83.5% of total arrays) falling between 2 182 and 18 protospacers per array ( Figure S1B). 183 The distribution of CRISPR array sizes in the metagenome was compared to 184 data collected from a ground-water microbiome (29), to compare the array size 185 distributions from environments with potentially different phage-host dynamics (11). 186 The results show that CRISPR arrays in the hypolith metagenome exhibited a 187 smaller and narrower size range, compared to the ground-water community  (12, 13). Evidence, albeit limited, that Antarctic soil phages exist predominantly in a 231 lysogenic rather than lytic lifestyle (14), has led to suggestions that the functional role 232 of phages in this spatially restricted, water-constrained desert soil niche may be 233

limited (11). 234
The results presented in this study provide the first evidence of interaction 235 between phage and hosts in this psychrophilic edaphic environment. This is most 236 evident in the correlation between the metavirome of the hypolith community and the 237 CRISPR-arrays, which suggest the active evolution of the adaptive immune system 238 against local viral threats. This idea of community adaption to local phage threat is 239 further implied by the positive correlation between the CRISPR arrays and viruses 240 extracted from local soils. In fact, a previous study (37)  (29). Together, these results imply a model for viral-host interactions in hypoliths 258 that follows the 'static-step-static' development model suggested by Pointing et al. 259 (38), driven by the stochastic and intermittent nature of rain events in such water-260 limited ecosystems. A surprising result from this study is the prevalence of non-261 canonical innate immunity systems, the most prominent of which are the BREX and 262 DISARM systems. While these two systems have been shown to be widespread in 263 bacteria using a pan-genomic dataset (21, 23), the present study represent the first 264 evidence for the prevalence of these systems in ecological samples. As such, this 265 result implies that non-canonical innate immunity is more important for anti-phage 266 microbial community defense than previously thought and should therefore be the 267 focus for future studies into innate immunity in the ecological context. There are also 268 indications from the hypolith metagenome that the prevalence of non-canonical 269 innate immunity over traditional RM and Abi system for defense against phages is 270 related to the adaptation of the hypolith communities to specific local viral 271 populations. For instance, the Zorya system, the third most prevalent non-canonical 272 immunity system in the metagenome, is hypothesized to operate similarly to the Abi 273 system (22). In turn, Zorya systems provide resistance against a limited range of 274 phages, including the ssDNA family Microviridae (22), which has been shown to be 275 prevalent in Antarctic aquatic and soil niches (39). 276 277

Conclusion 278
Together, these results are not consistent with the suggestion that the 279 constraints of the environment, such as low temperatures, low a w and resulting very 280 limited capacity for inter-particle diffusion, lead to extremely localized phage-host 281 interactions (11). Rather, the data are suggestive of a dynamic and continual 282 interaction between host and phage. Nevertheless, inter-particle communication and 283 exchange may be limited to brief periods when bulk liquid water is present, after 284 snow melt, for example. Furthermore, the low metabolic rates (the inevitable 285 consequence of Arrhenius effects (temperature dependence of reaction rates) in cold 286 environments) should also limit the rates at which phages can replicate and 287 propagate, further limiting the frequency of interactions with their hosts (40). We 288 suggest that the localized nature of host-phage interactions in the hypolith niche and 289 the limited inter-particle communication, where bacterial hosts are not frequently 290 challenged by novel phage threats, leads to a reliance of microbial communities on 291 innate immunity as the primary defense against phage infection. The smaller sizes of 292 CRISPR arrays in the Antarctic soil metagenome sequences compared to those from 293 a temperate aquatic environment, and the under-representation of CRISPR systems, 294 give further credence to the temporally sporadic interaction between phages and 295 their hosts. Nevertheless, the correlation between the metavirome and the CRIPR-296 cas arrays, together with the presence of bacteriophage evasion genes in the 297 metavirome, suggest that phage-host interactions within the hypolith community are 298 a dynamic process that leads to co-evolution of both phages and hosts. We therefore 299 suggest that phages play a hitherto underestimated role in driving the evolution of 300

Metagenome assembly and taxonomical annotation 317
Metagenomic DNA sequence data were quality-filtered by trimmomatic 318 version 0.36 using a phred cut-off > 30 (42). The assembly of high-quality reads from 319 the metagenome sequence dataset was conducted using the IDBA-UD tool (43) and 320 contig lengths were extended (scaffolded) using SSPACE Basic (43). The statistics 321 for the assembly of the metagenome are presented in Table S1. Contigs were 322 taxonomically assigned using the MEGAN v6 pipeline (44) with the NCBI taxonomy 323 database for taxon ID assignment. 324 325

Detection of the innate and adaptive defense systems 326
Metagenomic contigs were used for functional gene predictions using prodigal 327 v2.50, with the -meta parameter implementation (45). Predicted genes were 328 subsequently screened for domain similarity with known defense systems against 329 the conserved domain database (CDD) of clusters of orthologous groups (COGs) 330 and protein families (Pfams) using rps-blast (E value < 1e-02) (33). These results 331 were manually filtered for the identification of phage-specific defense systems, which 332 include restriction-modification (R.M), bacteriophage exclusion (BREX), abortive 333 infection (Abi), defense island system associated with restriction-modification 334 (DISARM), and other recently identified systems using a refined list of COG and 335 Pfam position-specific score matrices (PSSMs) for marker genes in these systems 336 (21)(22)(23)46). A list of the marker genes used in this study can be found in Table S2. 337 Additionally, defense genes that could not be clustered into a specific system were 338 classified as ambiguous as were not considered for subsequent analysis (Table S3). 339 ORFs predicted using prodigal v2.50 were queried against the CDD database 340 for the presence of putative CRISPR-cas genes (47), using delta-blast at a cutoff E 341 value of 1e-03. Multi-gene cas modules were identified as those having multiple cas 342 annotated genes with ≤5 ORF spacings. Type and subtype classifications were 343 assigned following the updated classification set by Makarova et al. (25). 344 345

Phage genome identification and CRISPR spacer matching 346
Antarctic hypolith phage genomes were identified from the assembled 347 metagenome using VirSorter (30) on the iVirus platform hosted by Cyverse (48), 348 using the virome database and the microbial decontamination option. Only 349 predictions of categories 1, 2, 4 and 5 were used (phages and prophages identified 350 with the "pretty sure" and "quite sure" qualification). Additional phage environmental 351 phage contigs were downloaded from the IMG/VR database version 2018-07-01_4 352 (32) and used for the network construction. Taxonomic assignment of assembled 353 contigs was performed by using the DIAMOND blastx function with a viral database 354 downloaded from the NCBI Viral Genomes Resource and e-value set to 1e-5. ORFs 355 of VirSorter contigs were predicted using Prodigal v2.50 (31, 49) with the virus 356 genomes setting and annotated using eggNOG-mapper v1 (50) with the DIAMOND 357 option and the EggNOG v4.5.1 database (51). Annotation were visualized with the 358 ApE v2.0.55 plasmid editor (http://jorgensen.biology.utah.edu/wayned/ape/). 359 The CRISPR recognition tool (CRT) v1.2 was used with the default settings to 360 search for CRISPR arrays in the hypolith metagenome (52). The identified spacers in 361 the arrays were matched with the VirSorter phage database and the IMG/VR 362 database using blastn of the BLAST+ suite with the following parameters: -363 qcov_hsp_perc 80 -task blastn -dust no -soft_masking false (53). Spacer matches of 364 > 90% sequence identity for the VirSorter genomes and >95% identity for the 365 IMG/VR genomes were exported and visualized as a network in Cytoscape (54), 366 where the nodes are spacers (grey) or genomes (blue = IMG/VR; red = VirSorter) 367 and the edges blastn matches. 368 369 370