Comparative analysis of prophages in Streptococcus mutans genomes

Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages.


INTRODUCTION
A prophage is a temperate bacteriophage genome integrated into a host bacterial DNA chromosome, which has the ability to enter a lysogenic state and replicate vertically with the host (St-Pierre & Endy, 2008). Prophages are an important source of virulence factors and other determinants that affect bacterial pathogenesis. Whole genome sequencing projects and comparative genomic analysis have revealed that prophage sequences are widespread among bacterial genomes, such as Moraxella catarrhalis (Ariff et al., 2015), Enterococcus spp. (Duerkop, Palmer & Horsburgh, 2014), Lactococcus spp. (Ventura et al., 2007), Mycobacterium spp. (Fan et al., 2014), and Streptococcus suis (Tang et al., 2013). Yet very little is known about Streptococcus mutans prophages.
Dental caries are the most prevalent dental disease and an important public health problem worldwide (Kidd & Fejerskov, 2013). The development of carious lesions stems from a dynamic process mediated by acid produced by cariogenic bacteria, such as Streptococcus mutans, Streptococcus sobrinus, and Lactobacilli, eventually resulting in demineralization and damage to the tooth structure (Argimon et al., 2014;Ericson et al., 2003). S. mutans are Gram-positive and biofilm-forming bacteria that can adhere to the tooth surface and contribute to dental plaque. S. mutans is the major pathogen responsible for dental caries in humans (Freires et al., 2017;Motegi et al., 2006). To the best of our knowledge, there have not been any reports describing S. mutans prophages, and only five S. mutans phages have been isolated. Three of them, M102, M102AD, and φAPCM01, have been sequenced (Dalmasso et al., 2015;Delisle et al., 2012;Van der Ploeg, 2007). Two other S. mutans phages, f1 and e10, have previously been isolated and tested for their host range and morphology, but not sequenced (Delisle & Rostkowski, 1993). Currently, there are 171 S. mutans genomic sequences in the National Center for Biotechnology Information (NCBI) database. Genomic sequencing of S. mutans has made it possible to identify prophages and perform comparative genomic analysis of prophage sequences and organization.
In this study, we screened all available complete S. mutans genomic sequences and identified 35 prophage-like elements present in these sequences. We also report the functional features of the intact prophages in comparison with another S. mutans phage, M102AD. Comparative genomic analysis and genome content analysis of S. mutans prophages were performed, and genetic information was analyzed.

Data collection and prophage sequence analyses
In total, 171 S. mutans genomes were obtained from NCBI. For prophage identification, tools such as PhiSpy (Akhter, Aziz & Edwards, 2012) and VirSorter (Roux et al., 2015) have been published as fast, relatively straight forward, and easier to use. We detected putative prophage DNA sequence data using the previously reported PHAge Search Tool Enhanced Release (PHASTER) method. PHASTER (http://phaster.ca/) was used to analyze bacterial genomes to identify and annotate putative prophage sequences (Arndt et al., 2016;Fan et al., 2016).

Phylogenetic analysis
Alignments of S. mutans phage and prophage genomic sequences were performed using MEGA version 7.0 (Tamura et al., 2007). Phylogenetic analysis was performed by the neighbor-joining (NJ) method and visualized using MEGA software. Phylogenetic distances were calculated by the NJ method using the same software.

Prophages are prevalent in S. mutans genomes
Data from 171 available whole S. mutans genomic sequences were downloaded from the NCBI website and analyzed (Table S1). The PHASTER web server was used to identify and annotate putative prophage regions within all S. mutans genomes. Thirty-five prophage-like elements were identified from 24 S. mutans genomes (13.45%) ( Table 1). The genome sizes of S. mutans prophages ranged from approximately 4.7 to 68.2 kilobases, and the GC content varied between 35.62 and 44.56%. Only three prophages (phismuNLML9-1, phismuN66-1, and phismu24-1) appeared to represent complete phages with intact genomes. The remaining prophages were incomplete or questionable. The genomes of S. mutans NG8, S. mutans R221, S. mutans M230, S. mutans N29, S. mutans NLML9, S. mutans N66, and S. mutans 24 were polylysogenic. As many S. mutans genomes have prophages, and clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR-associated (Cas9) can be viewed as a prokaryotic immune system that confers resistance to foreign genetic elements such as phages (Barrangou et al., 2007), we predict that CRISPR may be present in S. mutans genomes.

Comparative analysis between M102AD and S. mutans prophages
The S. mutans phage M102AD, which has a genome length of 30,664 bp and was isolated at the University of Maryland, was chosen as a reference phage, because it has been sequenced and well annotated (Delisle et al., 2012). The prophages phismuN66-1, phismuNLML9-1, and phismu24-1 all shared sequence similarity with M102AD (Fig. 4). The linear genomic comparison showed that phismu24-1 shared two major sequence (1,199 and 466 bp) similarities of 84 and 83.2% identity, respectively, with M102AD at the nucleotide level. BLASTn comparison of phismuNLML9-1 and M102AD revealed three major sequences (666, 454, and 423 bp) with 85.6, 82.9, and 82.7% identity at the nucleotide level. PhismuN66-1 shared three major sequences (749, 473, and 124 bp) with 85.6, 83.6, and 84.2% identity in comparison with M102AD genomes.

Summary of features of S. mutans prophage genomic sequences
Three intact prophages (phismu NLML9-1, phismu N66-1, and phismu 24-1) were identified in S. mutans, and all three prophages in Cluster B closely resembled the genome of M102AD. All of the ORFs of the prophages were predicted and annotated by PHASTER, GeneMarkS, and BLASP. PhismuNLML9-1, phismu24-1, and phismu66-1 exhibited the characteristic modular arrangement of the M102AD phage, including packaging and structural modules, integrase module, host lysis module, DNA replication/recombination module, transcriptional regulatory module, other protein module, and hypothetical protein module. The integrase gene was identified only in prophage phismuNLML9-1 (Fig. 5). A total of 37 ORFs were identified in the genome of phismu24-1 (Table S2). Of the 37 ORFs, 20 were assigned a putative function. No transfer RNA (tRNA) was found in the genome of phage phismu24-1. Three significant host lysis module regions were observed in ORF 34-36. ORF 34 encodes a putative holin protein, which can form pores in cytoplasmic membranes and release toxins and other proteins or contribute to biofilm formation (Saier & Reddy, 2015). ORF 35 and ORF 36 encode putative endolysin proteins, which can digest the bacterial cell wall for phage progeny release and may have the potential to be used as antibacterial agents (Fenton et al., 2010;Fischetti, 2008). A major head and major tail protein were identified from the products of ORF 20 and ORF 25, respectively. ORF 16 and ORF 21 are predicted to be DNA replication modules, the products of which are similar to the putative large subunit of terminase and the putative DNA packaging protein.
The prophage phismuNLML9-1 contains 54 phage-related genes (Table S3). Most of the ORFs of phismuNLML9-1 are flanked by a 13-bp repeat, indicative of attL and attR sites (Fig. 5). Most temperate phages enter the lytic cycle depending on the integrase gene, which functions in chromosomal integration and excision (Smith & Thorpe, 2002). Integrase genes were identified in ORF 6 of the prophage phismuNLML9-1, suggesting that phismuNLML9-1 has the ability to enter a lysogenic replication cycle. No putative tRNA or transfer-messenger RNA was recognized. A putative holin protein was encoded by ORF 34 and a putative endolysin by ORF 35 and ORF 36. ORF 18, ORF 9, and ORF 20 are predicted to encompass the replication module.
The phismu66-1 prophage genome contains 31 ORFs (Table S4). ORF 30 encodes the ABC transporter or permease protein, which functions as a multiple sugar metabolism transporter and is a promising target for antimicrobial strategies in S. mutans (Nagayama et al., 2014). ORF 25 encodes a host specificity protein and ORF 29 encodes a lysin-holin protein. In addition, the packaging and structural modules contained ORF 12,ORF 13,ORF 15,ORF 23,ORF 24,and ORF 26, and the transcriptional regulatory module was encoded by ORF 3, ORF 6, and ORF 11.
Horizontal gene transfer plays an important role in the adaptation and evolution of prokaryotes, and bacteriophages, as mobile genetic elements, enable horizontal gene transfer. In our study, we found that ORF 4 of phismuNLML9-1 and ORF 30 of phismu66-1 both encode an ABC transporter/permease protein, which is a virulence protein associated with the development of spontaneous resistance to compound 103 in S. aureus strains (Morisaki et al., 2016). Many unknown functional hypothetical proteins may play an important role in the acquisition of a specialized set of genes via prophages and horizontal transfer in S. mutans.

CONCLUSIONS
In conclusion, our genome sequencing data analyses identified 35 prophage-like elements present in the genome of S. mutans, all of which were identified for the first time. Genomic analysis of prophages revealed that those belonging to the same cluster displayed sequence similarities. The genomes and genetic information of phismuNLML9-1, phismu24-1, and phismu66-1 prophages were analyzed, identifying putative ORFs and functional regions.
To the best of our knowledge, this is the first systematic analysis of S. mutans prophages.