Genomic Characterization of Two Novel RCA Phages Reveals New Insights into the Diversity and Evolution of Marine Viruses

ABSTRACT Viruses are the most abundant living entities in marine ecosystems, playing critical roles in altering the structure and function of microbial communities and driving ocean biogeochemistry. Phages that infect Roseobacter clade-affiliated (RCA) cluster strains are an important component of marine viral communities. Here, we characterize the genome sequences of two new RCA phages, CRP-9 and CRP-13, which infect RCA strain FZCC0023. Genomic analysis reveals that CRP-9 and CRP-13 represent a novel evolutionary lineage of marine phages. They both have a DNA replication module most similar to those in Cobavirus group phages. In contrast, their morphogenesis and packaging modules are distinct from those in cobaviruses but homologous to those in HMO-2011-type phages. The genomic architecture of CRP-9 and CRP-13 suggests a genomic recombination event between distinct phage groups. Metagenomic data sets were examined for metagenome-assembled viral genomes (MAVGs) with similar recombinant genome architectures. Fifteen CRP-9-type MAVGs were identified from marine viromes. Additionally, 158 MAVGs were identified containing HMO-2011-type morphogenesis and packaging modules with other types of DNA replication genes, providing more evidence that recombination between different phage groups is a major driver of phage evolution. Altogether, this study significantly expands the understanding of diversity and evolution of marine roseophages. Meanwhile, the analysis of these novel RCA phages and MAVGs highlights the critical role of recombination in shaping phage diversity. These phage sequences are valuable resources for inferring the evolutionary connection of distinct phage groups. IMPORTANCE Diversity and evolution of phages that infect the relatively slow-growing but dominant Roseobacter lineages are largely unknown. In this study, RCA phages CRP-9 and CRP-13 have been isolated on a Roseobacter RCA strain and shown to have a unique genomic architecture, which appears to be the result of a recombination event. CRP-9 and CRP-13 have a DNA replication module most similar to those in Cobavirus group phages and morphogenesis and packaging modules most similar to those in HMO-2011-type phages. HMO-2011-type morphogenesis and packaging modules are found in combination with distinct types of DNA replication genes, suggesting compatibility with various DNA replication modules. Altogether, this study contributes toward a better understanding of marine viral diversity and evolution.

they have successfully obtained a large amount of genetic information of double-stranded DNA (dsDNA) metagenome-assembled viral genomes (MAVGs) (5)(6)(7)(8). Despite the enormous number of available viral genomes retrieved from metagenomic studies, the majority of MAVGs have limited similarity to phage isolates, and most open reading frames (ORFs) remained unannotated. In addition, the evolutionary history and functional capacity of marine viruses are still poorly understood.
Marine viral communities are mostly composed of phages that infect bacteria (1)(2)(3). Phages are known to have pervasively mosaic genomes, resulting from complex evolutionary processes during a long period of time (9)(10)(11). Current knowledge of the genetic diversity of phages suggests that genomic evolution of bacteriophages is mainly driven by extensive lateral gene transfer via genetic recombination and fast mutation (12)(13)(14). Pairwise genome comparison of over 2,000 phage genomes suggested that phages evolved with low or high gene flux modes, depending on the host, lifestyle, and genomes of phages (15). The most influential theory regarding lateral gene transfer is the theory of modular evolution (16), which proposes that evolution proceeds mainly at the level of functionally interchangeable units (modules). Phage evolution should thus be considered acting on functional modules. In the case of marine bacteriophages, successful (survivable) interchange of genomic modules must entail the continued integrity of the phage function, positive function execution, and functional compatibility (16,17). The sequences of phage genomes can provide useful information about how the phages have evolved. Therefore, the increasing number of MAVGs is important to understand phage evolution. In recent decades, phages infecting major marine bacteria groups have become a hot spot in marine virology due to the ecological and functional importance of their hosts. Among major bacteria groups, the Roseobacter group dominates coastal waters and the polar oceans (18)(19)(20). The Roseobacter group has diverse pathways for the metabolism of a variety of organic compounds (18)(19)(20)(21)(22). Among diverse Roseobacter lineages, RCA, CHAB-I-5, SAG-O19, and NAC11-7 dominate, but they are difficult to isolate and cultivate in the lab (23). As the largest lineage of the Roseobacter group, the RCA cluster comprises 35% of all planktonic bacteria in the southern ocean and up to 20% to 40% of all roseobacters in temperate and polar oceans (24)(25)(26). To date, more than 40 roseophages have been reported (27)(28)(29)(30). However, due to difficulties in host cultivation, researchers have only begun to explore the diversity and ecology of phages infecting dominant Roseobacter lineages. In 2019, Zhang et al. first reported phages infecting RCA strains (30). These RCA phages are genetically and evolutionarily diverse, belonging to four distinct phage groups (30). Furthermore, phage groups represented by RCA phages have been shown to exhibit wide distribution patterns and high relative abundance compared to other known roseophage groups (30).
Here, we report two novel phages that infect Roseobacter RCA strain FZCC0023. Interestingly, the genomes of CRP-9 and CRP-13 seem to contain genomic modules of distinct origins. Genes associated with viral replication are most similar to those in the Cobavirus genomes. Conversely, the morphogenesis and packing-related genes in CRP-9 and CRP-13 are most similar to those in HMO-2011-type phages. Moreover, many similar recombinant MAVGs were identified from marine viromic databases. Our results not only expand the understanding of the roseophages but also have broader implications for understanding the evolution of marine phages.

RESULTS AND DISCUSSION
Morphology of CRP-9 and CRP-13. Negative staining electron microscopy reveals that both CRP-9 and CRP-13 have short tails and isometric heads of similar size in diameter (Fig. 1A), about 70 nm, which are larger than that of the cobaviruses (28,31) but similar to that of HMO-2011-type RCA phage CRP-1 (30).
General genomic characteristics of CRP-9 and CRP-13. The genomic size of CRP-9 is 56,157 bp with 73 predicted ORFs and a tRNA-Arg (TCT) gene (Table 1). CRP-13 is similar to CRP-9 with a genome size of 55,015 bp, encoding 74 ORFs and a tRNA-Arg (TCT) gene. The G1C content of CRP-9 and CRP-13 is 41.6% and 44.2%, respectively, lower than in their host FZCC0023 (53.8%). Overall, CRP-9 and CRP-13 are closely related, sharing approximately half their genes (39 ORFs with 35% to 85% amino acid identity) and have a highly similar genome arrangement (Fig. 1B). Fifty ORFs in CRP-9 and 49 ORFs in CRP-13 have recognizable homologs in NCBI RefSeq; only 19 ORFs in CRP-9 and 17 ORFs in CRP-13 can be assigned to putative biological functions based on sequence homology. The ORFs with known function in CRP-9 and CRP-13 are mostly involved in the essential functions of the phage life cycle, including DNA replication and metabolism, phage morphogenesis, DNA packaging, and cell lysis (Fig. 1B).
Modules present in CRP-9 and CRP-13 genomes. The CRP-9 and CRP-13 genomes each consists of a DNA replication module, a DNA metabolism module, a morphogenesis module, and a DNA packaging module (Fig. 1B). In the DNA replication and metabolism modules, ORFs encoding thymidylate synthase, DNA primase, DNA polymerase, DNA methylase, endonuclease, and ribonucleotide reductase (RNR) were identified. Most genes in this region have similarity with their counterparts in the Cobavirus genomes and are arranged in the same order with cobaviruses ( Fig. 1B), suggesting that CRP-9 and CRP-13 share a similar DNA replication strategy with cobaviruses. Among these genes, two DNA replication genes in CRP-9 and CRP-13, DNA primase and DNA polymerase genes, are most similar to those in cobaviruses (28.0% to 60.2% amino acid identity). DNA polymerase phylogeny reveals that CRP-9 and CRP-13 are clustered with cobaviruses, locating on the same branch with CRP-5, CRP-4, and ICBM2 (Fig. 2). Although the DNA replication modules in CRP-9 and CRP-13 show conservation with cobaviruses, another two modules in CRP-9 and CRP-13 display no significant similarity to cobaviruses except for a few homologous ORFs. CRP-9 and CRP-13 genes in the morphogenesis module are most similar to those in HMO-2011-type phage genomes (35% to 99% amino acid identity) (Fig. 1B). The essential proteins encoded by this module include tail structure protein, tail fiber protein, major capsid protein, and portal protein. The terminase large unit and small subunit (TerL and TerS) in their DNA packaging modules are also most similar to those of HMO-2011-type phages (39% to 93% amino acid identity) (Fig. 1B). We also observed that some DNA metabolism genes in CRP-9 and CRP-13 also share identity with those in HMO-2011-type phage genomes.
CRP-9 and CRP-13 can be classified roughly at the genus level based on the phage genus definition criterion (namely, sharing .40% of genes) (32). Here, we refer to this group as the CRP-9-type phage group, characterized as harboring Cobavirus-type DNA replication genes and HMO-2011-type morphogenesis and packaging modules.
The HMO-2011-type and Cobavirus groups are two dominant phage groups in the ocean, exhibiting broad distribution patterns and high relative abundance (28,30,33). To date, all cobaviruses were isolated from marine roseobacters (28,30,31,34). The HMO-2011-type group currently has four members; one infects the SAR116 bacterium IMCC1322, and the other three infect RCA roseobacters (30,33). The genomes of CRP-9 and CRP-13 seem to be the combination of HMO-2011-type and cobaviruses genomes. This novel module combination in the CRP-9 and CRP-13 genomes is of great interest. As the modular theory proposed, functional modules can be shuffled between phage genomes by recombination, resulting in new combinations of modules and thus create potentially novel and viable phages (16). Phage genomic modules can be exchanged with other phages by recombination at specific locations between the modules. It has been suggested that groups of exchanged genes must function together (16). It is currently difficult to track the evolutionary history of this phage type or to identify the boundaries where recombination has occurred, because these phages share limited similarity with other types of phages at the nucleoide level. Lateral gene transfer between phages mainly occurs during coinfection, that is, when more than one phage infects the same host. Coinfection has been identified in some marine bacteria genomic data (35,36). Such recombination may have occurred between different phages that coexisted within the same host or between a resident prophage and an infecting phage. It is noteworthy that all HMO-2011-type phage isolates encode an integrase gene, implying the ability to establish a lysogenic life cycle (30,33).
AMGs identified in CRP-9 and CRP-13. Auxiliary metabolic genes (AMGs) are a class of phage-encoded metabolic genes speculated to have functions similar to those of their respective host metabolic genes, regulating host metabolism and improving phage adaptability (37,38). AMGs are usually classified by function into class I and class II. AMGs encoding class I proteins are present in the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways, while AMGs encoding class II proteins are not present in KEGG metabolic pathways (38). AMGs are frequently identified in marine roseophage genomes (39). In roseophage genomes, the most prevalent AMGs are those involved in metabolism and nucleotide synthesis (39,40). This study identified several AMGs in CRP-9 and CRP-13, including the phosphate starvation-inducible protein gene (phoH), the nucleoside triphosphate pyrophosphohydrolase gene (mazG), and the glutaredoxin gene. The phoH genes detected in numerous marine phages have previously been described with diverse function, which includes participating in phospholipid metabolism, RNA modification, and fatty acid beta-oxidation (41,42). The identification of phoH from CRP-9 may suggest that CRP-9 is more successful during infection when the host is undergoing low-phosphate conditions.
The mazG gene identified in CRP-9 is a highly conserved gene prevalent in bacterial genomes and has also been identified from many marine phage genomes (30,33,39,43), including all known RCA phages (30). It has been suggested that MazG protein is important for maintaining the starved host metabolism and therefore benefit the propagation of infecting phages (44,45). In Escherichia coli, MazG has been reported to interfere with the FIG 2 Unrooted maximum-likelihood phylogenetic tree of phage DNA polymerases constructed with conserved polymerase domains. Black and white circles indicate nodes with 90% to 100% and 70% to 89% bootstrap support, respectively. Roseophages, cyanophages, and pelagiphages are shown in red, green, and blue, respectively. Roseophages isolated in this study are marked with red asterisks. Scale bar represents amino acid substitutions per site. function of the MazEF toxin-antitoxin system by regulating the cellular level of (p)ppGpp (46). However, a recent study showed that a cyanophage MazG has no binding or hydrolysis activity against alarmone (p)ppGpp but has high hydrolytic activity toward dGTP and dCTP, and it was spectated to play a role in hydrolyzing high G1C host genome for phage replication (47). The G1C content of CRP-9 is significantly lower than its host. It is possible that MazG protein in CRP-9 has similar hydrolytic activity to facilitate phage genome replication.
Prevalence of HMO-2011-type morphogenesis and packaging modules in MAVGs. We next sought to investigate the prevalence of the HMO-2011-type morphogenesis and packaging modules in marine MAVGs. A total of 318 MAVGs were retrieved from the Global Ocean Virome 2.0 (GOV 2.0). Genes encoding major capsid, tail protein, portal protein, and terminase in these MAVGs all share high amino acid identity with HMO-2011-type phages, organized in the same order, which suggests close evolutionary relationships. In addition, there are very few insertions and deletions within the HMO-2011-type morphogenesis and packaging modules. Among these 318 MAVGs, 145 contain an HMO-2011-type DNA replication module, thus belonging to the HMO-2011-type phage group. The remaining 173 MAVGs contain DNA replication genes with distinct evolutionary origins (Data Set S1). These results confirm that the HMO-2011-type morphogenesis and packaging modules are present in various phage groups in combination with distinct DNA replication genes. We used the DNA polymerase, capsid, and TerL sequences of these 173 MAVGs for phylogenetic analyses (Fig. 3A, B and C). The capsid and TerL phylogenies also show that all 173 MAVGs cluster with HMO-2011-type phages ( Fig. 3B and C).
A conserved DNA polymerase domain (PF00476)-based phylogenetic analysis separated the 173 MAVGs into four major groups (I to IV), distinct from the HMO-2011-type group (Fig. 3A). Among these 173 MAVGs, 15 are associated with CRP-9 and CRP-13 in the DNA polymerase tree (Fig. 3A, group I). Genomic comparison shows that all these 15 MAVGs are largely syntenic to CRP-9 and CRP-13 throughout the whole-genome region (Fig. 4A). Therefore, all 15 MAVGs belong to the CRP-9-type phage group.
Twelve MAVGs are shown to cluster with RCA phage CRP-6 in the DNA polymerase tree (Fig. 3A, group II). CRP-6 represents a novel phage type, sharing limited homology with other known phage isolates (30). Genomic comparison shows that these 12 MAVGs and CRP-6 have a very similar DNA replication module but are distinct in morphogenesis and packaging modules (Fig. 4B). All 12 MAVGs contain a set of DNA replication genes most similar to CRP-6, including DNA primase, DNA polymerase, and a few other associated genes of unknown function (Fig. 4B). We noticed that the majority, 10 of these 12 MAVGs, contain an integrase gene homologous to that in HMO-2011-type phages (36.5% to 42.7% amino acid identity). In contrast, an integrase gene was not identified in other groups. In HMO-2011-type phages, integrase gene is located upstream of DNA replication genes (30,33). In other MAVGs, the integrase genes may have been lost during recombination, or their ancestor did not contain the integrase gene.
The remaining 146 MAVGs, we found, are more closely related to HTVC103P-type pelagiphages in the DNA polymerase tree and can be further divided into two groups (Fig. 3C, group III and IV). As in CRP-9 and CRP-13, a remarkable feature of the HTVC103P-type group is that these pelagiphages all harbor a set of morphogenesis and packaging genes similar to their counterparts in HMO-2011-type genomes (33). Genomic comparison reveals remarkable synteny among group III MAVGs and HTVC103P-type pelagiphages (Fig. 4C). They share 33.3% to 97.7% of their ORFs with HTVC103P-type pelagiphages and have conserved DNA replication, morphogenesis, and packaging modules with HTVC103P-type pelagiphages. Thus, they all fall within the HTVC103P-type phage group and may infect SAR11 bacteria. In addition, the G1C content of group III MAVGs ranged from 30.8% to 34.0%, similar to the G1C content of all known pelagiphages. There is an obvious but more distant relationship between group IV MAVGs and HTVC103P-type pelagiphages ( Fig. 3C; Fig. 4D). Compared with group III genomes, they share a relatively lower percentage of their ORFs (20.9 to 65.9%) with HTVC103P-type pelagiphages and have a wider range of G1C content (31.9 to 44.4%). Due to the lack of cultured representatives in this group, the hosts of group IV remain to be further explored.
The analysis of these MAVGs reveals that similar HMO-2011-type morphogenesis and packaging modules can exist in combination with various DNA replication modules. In each case, we found the DNA polymerase gene location closely associated with the helicase gene, and each DNA polymerase-helicase combination was exclusive to each group. In phage genomes, DNA polymerase reacts with DNA helicase in the DNA replication process (48). Clearly, a particular DNA polymerase-helicase combination is vital for a functional DNA replication module. Due to the intimate interactions of DNA replication genes, some phages that undergo DNA replication module replacement survive to infect a host.
The novel module combination of CRP-9, CRP-13, and these MAVGs supports the modular nature of phage genomes. Similar observations were previously reported for several phage genomes (10,12,(49)(50)(51)(52)(53)(54). Collectively, these results indicate that genetic recombination plays a major role in the diversification of phage genomes, producing novel combinations of genomic modules. Many phage types have arisen from recombination between two phage ancestors.   Phage genomes generally have a higher frequency of recombination to mutation (55). The recombination rate is also highly variable across the genome regions. The horizontal exchange of sequences among phage genomes is thought to be rampant between phage genomes and randomly distributed over the genomes (9). Only a minor fraction of sequence exchange, without interrupting module function, can produce functional and competitive recombinants. Based on our observation, it can be speculated that DNA replication modules were transferred between phage genomes, which resulted in new combinations of functional modules and production of new phage types. The identification of these MAVGs suggests that HMO-2011-type morphogenesis and packaging modules are functionally compatible with various DNA replication modules. Existing phages with these types of combinations have survived natural selection. We searched for MAVGs containing the HMO-2011-type DNA replication module and the Cobavirus-like morphogenesis and packaging modules. A total of 8 corresponding MAVGs were successfully retrieved from the GOV 2.0 database (Fig. 5). These 8 MAVGs all contain an HMO-2011-type DNA replication module with DNA polymerases having a conserved DnaJ domain (box in Fig. 5) (33). The discovery of this MAVG genotype further supports the idea that recombination plays a critical role in generating novel phage genotypes.
Due to the prominent role of recombination in the diversification of bacteriophage genomes, the assessment of phylogenetic relationships among different phage groups is challenging. Many phage genome sections display different evolutionary histories. In these cases, classification of these phages is challenging, phylogenetic analysis based on a single    hallmark gene can be problematic, and network analysis based on gene content may require caution. In some studies, some "hybrid" phage gene modules have a high level of nucleotide identity with other types of phage gene modules through whole-genome nucleotide comparisons, suggesting that the exchange events occurred relatively recently (10,56). However, in this study, it was difficult to reconstruct the modular recombination history of these phages; the precise boundaries of these "modules" have remained elusive using current genomic data due to CRP-9-type genomes lacking nucleotide-level similarity with other types of phages.
Conclusion. In this study, a novel type of marine RCA phages has been identified, providing more insights into the diversity and evolution of marine RCA phages. They both possess a Cobavirus-type DNA replication module, suggesting that they use the same DNA replication strategy with cobaviruses. However, their morphogenesis and packaging modules share common ancestry with HMO-2011-type phages. This is the first case of Cobavirus-type DNA replication genes being found outside the Cobavirus group. These two RCA phages and recombinant forms of MAVGs confirm that the recombination of modules among phages is an important evolutionary process that shapes the structure of marine phages. The global marine phage population may harbor a much higher diversity of genomic architectures than has been previously recognized.

MATERIALS AND METHODS
Host strain and growth condition. Roseobacter RCA strain FZCC0023 was isolated from coastal water of Pingtan Island, China in 2017 by dilution to extinction method (30). The detailed phylogenetic information on FZCC0023 has been described in an earlier study (30). FZCC0023 was grown at 23°C in modified natural seawater-based medium amended with 1 mM FeCl 3 , excess vitamins, 1 mM NH 4 Cl, 100 mM KH 2 PO 4 , and mixed carbon source (57).
Source waters and isolation of CRP-9 and CRP-13. A seawater sample was collected from the coast of Pattaya Beach, Thailand (latitude, 12°569N; longitude, 100°539E) in March 2018. The other seawater sample was collected from the German Bight, North Sea (latitude, 53°569N; longitude, 7°489E) in March 2019, by the RV Heincke during the cruise HE526 (58) ( Table 1). Water samples were filtered through a 0.1-mm syringe filter (Pall Gelman Laboratory, USA) and stored at 4°C until use. The phages were isolated according to the procedure described in previous reports (30). Briefly, 0.1 mm filtered seawater was added to exponentially growing FZCC0023 cultures. The cultures were incubated at 23°C until cell lysis was detected by using a Guava easyCyte flow cytometer (Merck Millipore, Billerica, MA). The presence of phage particles was also confirmed by epifluorescence microscopy. To obtain the pure phage clone, the extinction dilution procedure (59) was repeated three times. The purity of the RCA phages was examined by genome sequencing.
Transmission electron microscopy. CRP-9 and CRP-13 lysates were filtered through 0.1-mm-poresize filters and then concentrated by Amicon Ultra centrifugal filters (30 kDa; Merck Millipore). Next, CRP-9 and CRP-13 were further concentrated by an ultracentrifuge (Beckman Coulter, USA) at 50,000 Â g for 2 h. The concentrated phage samples were adsorbed onto a copper grid in the dark and stained with 2% uranyl acetate for 2 min. A Hitachi transmission electron microscope was used to observe the samples at 80 kV.
Phage DNA extraction. The preparation and concentration of phage lysates were carried out as described by Zhang et al. (30) Briefly, 250 ml of host culture was infected with each phage at a phage-tohost ratio of approximately 3:1. After host lysis, the phage lysates were filtered through 0.1-mm-pore-size filters and then further concentrated by Amicon Ultra centrifugal filters (30 kDa; Merck Millipore). Phage DNA was extracted using a formamide treatment, phenol-chloroform extraction method (60). Subsequently, the phage genomes were sequenced with paired-end sequencing by Mega Genomics (Beijing, China) using the HiSeq 2500 sequencing system (Illumina). The Illumina raw reads obtained were quality controlled, adaptor trimmed, and assembled using CLC Genomic Workbench 11.0.1 software (Qiagen, Hilden, Germany) with default settings.
Retrieval of MAVGs. A total of 515,588 MAVGs from the following metagenomic data sets were downloaded for analysis: (i) the Med-DCM fosmid library (5), (ii) the Global Ocean Viromes (7), and (iii) the Global Ocean Virome 2.0 (GOV 2.0) (8). The following search strategy was used to retrieve the MAVGs. First, HMO-2011-type capsid and TerL sequences (including sequences from HMO-2011-type phages, CRP-9, CRP-13, and HTVC103P-type pelagiphages) were searched against genomes of MAVGs using TBLASTN (E value , 1E-3, coverage $ 80%), resulting in 1,242 MAVGs that contain both HMO-2011-type capsid and TerL genes. After this initial search, a search was conducted to identify DNA polymerase genes in the MAVGs. Among the 1,242 MAVGs, 318 MAVGs had a DNA polymerase domain (PF00476) and were retained for further analysis. Next, 145 MAVGs containing an HMO-2011-type DNA polymerase were excluded from further analysis. The remaining 173 MAVGs all contain the HMO-2011-type morphogenesis and packing modules, but their DNA replication modules are distinct from HMO-2011-type phages (Data set S1). All 173 MAVGs were recovered from GOV 2.0, and no MAVGs were obtained from the Med-DCM fosmids or GOV data sets. The above processes were applied to retrieve MAVGs containing HMO-2011-type morphogenesis and packing modules plus other types of DNA replication modules.
Data availability. The genome sequences of CRP-9 and CRP-13 have been deposited in the GenBank database under accession numbers MW514246 and MW514247.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, XLSX file, 0.02 MB.