Comparative analysis of prophage-like elements in Helicobacter sp. genomes

Prophages are regarded as one of the factors underlying bacterial virulence, genomic diversification, and fitness, and are ubiquitous in bacterial genomes. Information on Helicobacter sp. prophages remains scarce. In this study, sixteen prophages were identified and analyzed in detail. Eight of them are described for the first time. Based on a comparative genomic analysis, these sixteen prophages can be classified into four different clusters. Phylogenetic relationships of Cluster A Helicobacter prophages were investigated. Furthermore, genomes of Helicobacter prophages from Clusters B, C, and D were analyzed. Interestingly, some putative antibiotic resistance proteins and virulence factors were associated with Helicobacter prophages.


INTRODUCTION
Prophages, a type of phage that integrates into and remains in a bacterial genome, play an important role in the genomic diversification and fitness cost of bacteria to the infected host. As a class of genetic elements, some prophages can mediate horizontal gene transfer in the evolution of bacterial genomes (Lang, Zhaxybayeva & Beatty, 2012). Because they carry virulence genes, some prophages make outstanding contributions to bacterial pathogenesis (Penadés et al., 2015) and some have also contributed to the fitness cost of bacteria to the infected host (Fan et al., 2013). Therefore, it is essential to search for the presence of prophages in the bacterial genomes and to analyze them. To date, studies have identified prophages in a diverse range of hosts, such as Moraxella catarrhalis (Ariff et al., 2015), Lawsonia intracellularis (Vannucci, Kelley & Gebhart, 2013), Bifidobacterium spp. (Lugli et al., 2016;Ventura et al., 2009), Lactococcus spp. (Ventura et al., 2007), Mycobacterium spp. (Fan, Abd Alla & Xie, 2015;Fan et al., 2014), Streptococcus spp. (Tang et al., 2013), and some plant-pathogenic bacteria (Varani et al., 2013). However, a systemic investigation of genomic information and function of Helicobacter prophages is largely lacking.
Helicobacter is a genus of Gram-negative bacteria, most frequently found in the upper gastrointestinal tract of mammals. One well-known species of the genus is Helicobacter pylori, a carcinogen identified by the World Health Organization (Uemura et al., 2001). H. pylori infection may be associated with gastritis, peptic ulcer, and gastric cancer (Peek & Blaser, 2002). Other non-pylori Helicobacter species such as H. suis, H. felis, H. bizzozeronii and H. salomonis have been reported and also exhibit carcinogenic potential in animals (O'rourke, Grehan & Lee, 2001). Previous research suggests that Helicobacter phages and prophages are unusual (Canchaya, Fournous & Brüssow, 2004). Information on Helicobacter prophages is becoming increasingly available. Two prophage-like elements were detected in Helicobacter acinonychis str. Sheeba (Eppinger et al., 2006). One prophage-like element was found within Helicobacter felis ATCC 49179 (Arnold et al., 2011). One prophage, phiHP33, which can be induced by UV irradiation, was found in H. pylori B45 (Lehours et al., 2011). Luo and colleagues (2012) found that the H. pylori str. HP1961 chromosome contains a full-length prophage 1961P. Luo also found that H. pylori Cuz20, H. pylori India7, H. pylori B38, H. pylori F16, and H. pylori Gambia94/24 chromosomes all contain a prophage-like element (Luo et al., 2012). In addition, two potential prophages were described in H. pylori str. Egypt (Abdel-Haliem & Askora, 2013). These findings suggest that prophages are common within the Helicobacter genomes. Vale et al. (2015) have demonstrated that prophages play a role in the diversity of H. pylori. The function of Helicobacter prophages is nonetheless ill-defined. Some researchers suggest that it is possible to use Helicobacter phages to control some diseases caused by H. pylori (Abdel-Haliem & Askora, 2013). However, if virulence factors and antibiotic resistance genes are found associated with Helicobacter phages or prophages, it is worth reconsidering phage therapy as treatment of H. pylori infections. As of 1 Oct 2015, eighty-one Helicobacter species genomes have been sequenced and assembled. These comprise an essential dataset for researching the presence of Helicobacter prophages.
As mentioned above, it is important that ''hidden'' Helicobacter prophages are identified. In this study, we screened all the available complete Helicobacter sp. genome sequences deposited in GenBank for the presence of prophages. We here report the results of our comparative genomic analysis, genome content analysis, and prophage-encoded virulence and antibiotic resistance gene analysis of Helicobacter prophages.

Data collection and identification of Helicobacter prophages
Eighty-one complete Helicobacter genomes were downloaded from NCBI (the National Center for Biotechnology Information). Helicobacter prophages were identified using a previously reported method (Fan et al., 2014). In the first place, we used PHAST (http://phast.wishartlab.com/index.html) to analyze bacterial genomes to find candidate prophages. Next, we screened integrase gene from prophage genomes to drop false positives results. Finally, based on the presence of significant homology between ORFs (open reading frames) and known phage genes, we obtain Helicobacter prophages.

Genomic and comparative genomic analyses of Helicobacter prophages
Prophage flanking sites attL and attR were identified using DNAMAN. Prophage genes were annotated using Glimmer (Delcher et al., 2007). Dot plot comparisons of Helicobacter prophage genomes were carried out with Geneious software (Kearse et al., 2012). Global genome comparison was performed using BLASTn, at NCBI (http://blast.ncbi.nlm.nih.gov/Blast.cgi), and results were shown by ACT software. For all software, default settings were used.

Prophages in Helicobacter sp. genomes
Eighty-one complete Helicobacter sp. genomes (Table S1) were retrieved. Thirteen prohages (Table 1) were detected using a previously reported method (Fan et al., 2014), eight of them were novel, and five of them have been described in the literature (Luo et al., 2012). Moreover, seven reported prophages (Table 1) from Helicobacter genomes were not detected in the screen (Arnold et al., 2011;Eppinger et al., 2006;Lehours et al., 2011;Luo et al., 2012). Two of them, contained in the genomes of H. acinonychis str. Sheeba and H. felis ATCC 49179, have not been designated. We named them phiHac_1 and phiHFELIS_1, respectively. It is worth noting that phiHac_1, phiHFELIS_1 and two other prophages from H. pylori str. Egypt, HPE1 and HPE2, all lack sequence information. The original papers where these prophages were identified did not provide the sequence information and we cannot retrieve it from the corresponding genomes using our screening method. We therefore discarded them during follow-up analyses. In general, sixteen prophages are analysed.
The size of all Helicobacter prophage genomes varies between 5.5 kb and 39.3 kb. Based on the presence of predicted prophage proteins and the length of the prophage genomes, nine sequences were designated as full-length prophages, and seven sequences were labeled prophage-like elements.

Helicobacter phage Cluster A
Based on the similarities of their genomes, Helicobacter Cluster A phages were divided into four subclusters. Phages belonging to one subcluster are more closely related to each other than to phages in the remaining subclusters (Figs. S1 and S2). Some subcluster A1 phages (phiK750_1, Sheeba, KHP30, KHP40, 1961P, phiHP33, Cuz20 and India7) possess 70.57% identity with each other, as determined by multiple genomic sequence alignments in DNAMAN. In addition, a BLASTn comparison of phiNY40_1 and phiK750_1 revealed one major sequence segment (8,953 bp) with 81% identity and three segments (3,550 bp, 3,039 bp, and 1,997 bp) with identity greater than 76%. Based on the multiple genomic sequence alignments, all subcluster A2 phages displayed 82.79% identity between each other.

Notes.
NM means that these data were not mentioned. a Those prophages were detected in the screen. b Those prophages had been described in the literature. c The prophage lack sequence information. d Those prophages are full-length prophage. e Absent attR from the junction. Different subclusters in Helicobacter phage Cluster A possess segments of DNA similarity. Phages of subclusters A2, A3, and A4 all shared sequence similarity with subcluster A1 phages (Fig. 2). These are remnant prophage-like elements that have lost sequence segments during evolution. Subcluster A2 prophages retained an upstream region with many virionassociated genes of the subcluster A1 prophages. Subcluster A3 prophage (prophage B38) retained only an incomplete upstream region (5.5 kb) of subclusters A1 and A2 prophages. Subcluster A4 prophage (prophage F16) retained a downstream region containing many DNA metabolism genes of the subcluster A1 prophages. Genome organization of most Cluster A phages has been reported (Luo et al., 2012).

Helicobacter phage Cluster D
PhiHBZC1_1 is found in Helicobacter bizzozeronii CIII-1. It belongs to Cluster D and does not share any similarities with other Helicobacter phages. As a full-length prophage, the genome size of phiHBZC1_1 is 39.3 kb. There are fifty-eight ORFs in this genome (Fig. 3), spanning a region from HBZC1_17420 (DNA invertase-encoding) to HBZC1_17990 (site-specific recombinase integrase-encoding). The prophage is flanked by two 14 bp attL and attR sites (Table 1). Sequence alignment analysis indicated some level of similarity between thirty ORFs of prophage phiHBZC1_1 and other known phage genes. Of these, twenty-eight ORFs could be assigned biological functionalities (Table S4).
The genome of phiHBZC1_1 can be divided into several different functional modules. The lysis module includes HBZC1_17600 and HBZC1_17620, which encode a holin and a lysozyme protein, respectively. The DNA packaging and virion-associated modules consist of HBZC1_17440, coding for a phage terminase large subunit; HBZC1_17470, encoding a phage tail protein; phage tail tape measure proteins-encoding HBZC1_17480, HBZC1_17490, and HBZC1_17500; phage tail proteins-encoding HBZC1_17530, HBZC1_17540, HBZC1_17550, HBZC1_17630, HBZC1_17640, and HBZC1_17660; HBZC1_17560, encoding a phage tail sheath-like protein; HBZC1_17670, encoding a phage baseplate protein; capsid proteins-encoding HBZC1_17740 and HBZC1_17750; HBZC1_17860, encoding a portal protein; HBZC1_17880, encoding a phage terminase large subunit; and HBZC1_17900, encoding a phage baseplate assembly protein V. The DNA metabolism module comprises of three genes (HBZC1_17420, HBZC1_17570, and HBZC1_17830), whose predicted protein products are phage DNA invertase, DNA methyltransferase, and DNA polymerase, respectively. The transcriptional regulatory module is composed of HBZC1_17460 (encoding a phage late control D family protein), HBZC1_17930 (coding for the repressor LexA), and HBZC1_17970 (encoding a YcfA family protein). The lysogeny module appears to be limited to HBZC1_17990, whose predicted protein product is a phage integrase.

Putative antibiotic resistance genes and virulence factors associated with Helicobacter prophages
Except for phiHBZC1_1, none of the other characterized Helicobacter prophages contain known antibiotic resistance genes. The protein encoded by HBZC1_17700 shows high similarity to multidrug resistance protein D (emrD) of Salmonella enterica subsp. enterica serovar Infantis (Table 2). Multidrug resistance protein D belonging to the major facilitator superfamily facilitates the transport of a variety of antibiotics (Shaheen et al., 2015).
A range of phage-encoded virulence genes was identified within the Helicobacter prophage sequences (Table 2). A DNA methyltransferase-encoding gene was identified in most of the analyzed Helicobacter prophages. DNA methyltransferase is thought to contribute to the specificity of bacterium-host interactions or H. pylori virulence (Vitkute et al., 2001). Furuta and colleagues (2015) found that DNA methyltransferase genes are rapidly evolving in H. pylori genomes, which facilitates H. pylori adaptation to a new host. A protein encoded by phiNY40_1 (NY40_0553) displayed 23% identity with a serine/threonine kinase of Thiorhodococcus drewsii. Phosphorylation of proteins usually occurs during interactions between bacterial cells and host cells and plays a role in bacterial pathogenesis (Cozzone, 2005). Serine/threonine kinases are considered to affect cell survival pathways and contribute to H. pylori pathogenesis (King & Obonyo, 2015). A putative glycosyltransferase is encoded by phiHCD_1. Glycosyltransferases are involved in biosynthesis of LPS (Luke et al., 2010) that can promote proliferation of gastric cancer cells (Tomoda, Kamiya & Suzuki, 2015). An antitoxin component RelB of the addiction toxin-antitoxin (TA) module system RelBE was identified in phiHBZC1_1. The protein plays a role in cell survival (Park, Son & Lee, 2013).

CONCLUSIONS
In brief, we present here sixteen Helicobacter prophages. Eight of them were identified for the first time after mining the sequenced Helicobacter sp. genomes, and the other eight had been reported in published literature. Based on comparative genomic analyses, the sixteen phages were sorted into four clusters, Clusters A-D, respectively. Cluster A was further divided into four subclusters, subclusters A1-A4. Different subclusters displayed similarity to each other. Subcluster A1 phages are full-length prophages. Subcluster A2, A3 and A4 phages are remnant prophage-like elements. The genomes and genetic information of the Cluster B, C and D phages were analyzed. Interestingly, several genes encoding antibiotic resistance proteins and virulence factors were found within various prophage genomes. These results highlight an important issue, which needs to be resolved before proceeding with phage therapy for treatment of H. pylori infections. To our knowledge, this is the first systematic analysis of Helicobacter prophages. With more forthcoming Helicobacter genome sequences, more Helicobacter prophages will be identified, and the role of prophages in evolution, adaptations and physiology of Helicobacter sp. will be clarified.