Introduction

The recent increase in phage genome sequences available in public databases (828 in the order Caudovirales as of June 20, 2013) along with the development of bioinformatic tools has enabled the classification of bacteriophages on the basis of their genetic relationships. Recent approaches led to the creation of the subfamilies Autographivirinae and Picovirinae in the family Podoviridae and the subfamily Tevenvirinae within the family Myoviridae [13]. To date, a large number of Staphylococcus aureus phages have been described, but a complete genetic and functional characterization has been performed only on a limited number. Previous attempts to classify S. aureus phages were based on genome sizes and resulted in three classes [4]. Class I included phages with a genome size <20 kb (P68, 66, and 44AHJD); class II, 20-40 kb (42e, 3A, 47, 187, 69, 53, 85, 2638A, 77, 37, 96, 71, 55, 29, 52A, 88, 92, EW and X2); and class III, 40-125 kb (Twort, K, and G1). Morphological differences among these classes were also observed. Class I phages belong to the family Podoviridae (C1 morphotype) with a short, non-contractile tail and an isometric head, whereas class II phages belong to the family Siphoviridae. The majority of class II phages show a B1 morphotype with a long, non-contractile tail and an isometric head. In this group, phages 42e, 3A, and 47 display a B2 morphotype with a long, non-contractile tail and elongated head. Finally, the three phages from class III are structurally defined as members of the family Myoviridae.

Recently, three novel temperate Siphoviridae phages (StB12, StB27, and StB20) from the coagulase-negative species (CoNS) Staphylococcus hominis and Staphylococcus capitis were isolated and characterized [5]. Based on the comparative analysis of these phages with prophage sequences, the authors proposed a novel classification for staphylococcal phages in nine different clusters.

The aim of the current work is to find a rational scheme for staphylococcal phage classification, which condenses the extensive data derived from genetic and proteomic studies to give some clues about the evolutionary relationships among these phages. Thus, we propose the creation of three new siphovirus genera named “3alikevirus”, “77likevirus” and “Phietalikevirus”, in which 46 staphylococcal phages have been grouped.

Comparative genomics

Bioinformatic analysis of 46 staphylococcal phages indicated relatively minor differences in genome sizes and G+C content, which range between 40-45.8 kb and 33-38 %, respectively. We first turned to comparative analysis employing BLASTN (http://blast.ncbi.nlm.nih.gov/); EMBOSS Stretcher (http://www.ebi.ac.uk/Tools/psa/emboss_stretcher/) and progressive-Mauve (http://gel.ahabs.wisc.edu/mauve/), along with protein content comparisons using CoreGenes 3.0 [68]. Before the global alignments could be performed, the genomes were manually colinearized, placing the arbitrary starting point for the sequence 150 bp in front of the gene for the small terminase subunit. The progressiveMauve alignments reveal a similar organization of the phages in modules that fit to the general structure of most double-stranded DNA bacteriophages [9]. These modules include packaging, structure/morphogenesis, host lysis, lysogeny and replication/regulation (Supplementary Figures 1-5). Taking into account phage morphology, genome size and organization, gene synteny, and a shared protein content of 40 %, three groupings can be described, for which we propose the status of genus, consistent with other Caudovirales groupings (Table 1): “3alikevirus” (3A, Φ12, 47, phiIPLA35, ΦSLT and 42e), “77likevirus” (77, ΦPVL108 and Φ13), and “Phietalikevirus” (ΦETA, 52A, 80, 29, 71, 55, ΦMR11, ΦETA3, 88, 92, X2, ΦNM4, 96, ΦETA2, Φ11, 53, 69, 80α, 85, ΦMR25, ΦNM1, ΦNM2, phiIPLA88, 187, SAP-26). Phages 2638A, TEM123, SpaA1, StB12, StB20, and StB27 did not reveal a clear relationship to the others and remain as orphan phages. Using the Needleman-Wunsch rapid global alignment algorithm of EMBOSS Stretcher, nucleotide similarities with the type phage in each genus have been computed. Values over 53 % for DNA sequence identity were observed within each proposed genus (Table 1). Between genera these values were lower than 50 % (data not shown). Focusing on the percentage of shared proteins with phage type, values over 42 % were observed within each genus, while lower values were determined between genera (data not shown).

Table 1 Features shared by members of the proposed genera “3alikevirus”, “77likevirus” and “Phietalikevirus”

Features of the bacteriophages belonging to the proposed genus “3alikevirus”

The proposed genus “3alikevirus” includes six of the staphylococcal phages. All of these viruses were reported to possess a prolate capsid (siphovirus B2 morphotype) that is 100 nm long and 50 nm wide, and a tail of 300-400 nm (for references, see Table 1). Comparative proteomic analysis using CoreGenes 3.0 and BLASTP indicates the presence of several group-specific proteins. Among the replication-associated proteins, there are a helicase-like protein and an A-type DNA polymerase. In the morphogenesis module, a unique capsid protein containing a pfam05065 motif (phage capsid family) and a major_cap_HK97 (TIGR01554; phage major capsid protein, HK97 family) motif can be distinguished. The tail module contains two specific proteins; the first shows a Sipho_tail (pfam05709) motif, while the second corresponds to a predicted prophage endopeptidase tail motif (pfam06605). Another protein that is exclusive for these phages is RinA, whereas the RinB homolog can be found in other phages.

Comparative analysis of DNA sequences using progressiveMauve alignments also indicates a particularly high degree of conservation in a region spanning the DNA-packaging proteins (small and large terminase subunits, portal and scaffolding proteins) (Supplementary Figure 1). In addition, all phages belonging to the genus “3alikevirus”, share a region encoding a putative nuclease (HNH superfamily).

A relevant characteristic of the genus “3alikevirus” is the presence of peptidoglycan hydrolytic domains included in their tape measure proteins (TMPs). Specifically, a lytic transglycosylase SLT domain and a peptidase_M23 domain are detected in all of them (Table 2).

Table 2 Peptidoglycan hydrolytic domains and tail proteins associated with staphylococcal phages

Features of the bacteriophages belonging to the proposed genus “77likevirus”

The proposed genus “77likevirus” includes three S. aureus phages with a similar genome size and morphology (B1 morphotype). No data on the size of the viral particles have been published. These phages share characteristics with the other genera (“3alikevirus” and “Phietalikevirus”), such as the presence of nucleases in phages 77 and Φ13, similar to those from the “3alikevirus” genus, and a common morphotype with “Phietalikevirus”. Among the peptidoglycan hydrolase domains, a lytic transglycosylase SLT domain and a peptidase_M23 domain, similar to those found in “3alikevirus”, were detected in the tail tape measure proteins (TMPs) of the three phages belonging to the genus “77likevirus” (Table 2) (Supplementary Figure 2). No other structural proteins with peptidoglycan catalytic domains were detected in these phages.

Features of the bacteriophages belonging to the proposed genus “Phietalikevirus”

The proposed genus “Phietalikevirus”, which includes 31 staphylococcal phages, can theoretically be divided into three subgroups (Supplementary Figures 3-5) based on DNA similarity. The genome size of these phages is very similar among all of them. Virion morphology is nearly identical, with isometric heads of about 50 nm in diameter and a tail 175 nm long. All phages belonging to this genus show a B1 morphotype, with the exception of phage EW, which was described as having an A morphotype [4]. Due to its high homology with B1 bacteriophages, we conclude that this is incorrect.

A particularly interesting region of the “Phietalikevirus” phages is the sequence of genes in the tail morphogenesis module (Fig. 1). Five arrangements with small differences were observed. In most of the phages from subgroup 1 and subgroup 2 (ΦETA, 52A, 80, 29, 71, 55, ΦMR11, ΦETA3, 88, 92, X2, ΦNM4, 96, ΦETA2, Φ11, 53, 69, 80α, 85, ΦMR25, ΦNM1, ΦNM2, and phiIPLA88), this region is composed of a protein belonging to the SGNH hydrolase superfamily, a hypothetical protein, a tail protein, a virion-associated peptidoglycan hydrolase and a tail fiber protein (Fig. 1A). For most phages belonging to subgroup 3 (37, CNPH82, PH15, phiIPLA5 and phiIPLA7) a pectin-lyase-encoding gene can be observed downstream of the SGNH protein (Fig. 1B and C). In all these phages, with the exception of phages phiIPLA7 and PH15, a tail protein is also present. Notable differences can be seen in the makeup of this region for phages 187 and EW, which lacks the pectin lyase. No tail protein could be identified on phage 187 (Fig. 1D and E). The SGNH hydrolases containing lipase and esterase domains are present in all phages within the genus, including phage 187, which has a truncated protein (Fig. 1D). Moreover, peptidoglycan hydrolytic activities are present in all phages as virion-associated proteins containing two catalytic domains: a CHAP domain and a glucosaminidase domain (Fig. 1; Table 2). All phages except phiIPLA7, PH15 and phage 187 encode tail fiber proteins with a collagen helix domain downstream from the peptidoglycan hydrolases. This domain consists of a triple helix formed by repetitions of the amino acid sequence glycine-X-Y. A shorter sequence of this protein is also observed for phages phiIPLA5 and CNPH82. The members of subgroup 3, with the exception of phage 187 (Fig. 1B, C and D), have a pre-neck appendage protein between the SGNH hydrolase and the tail protein endowed with a pectin lyase and a peptidase domain.

Fig. 1
figure 1

Morphogenetic tail region in members of the proposed genus “Phietalikevirus”. (A) Structure for phages ΦETA, 80, 52A, 29, 55, ΦMR11, ΦETA3, ΦNM4, 96, Φ11, ΦNM1, ΦNM2, 69, 80α, 53, ΦETA2, ΦMR25, 85, phiIPLA88, 92, X2, 88, 71. (B) Structure shared by phiIPLA5, CNPH82 and phage 37. (C) Structure for phiIPLA7 and PH15. (D) and (E) Structure for phage 187 and EW, respectively. A dashed arrow represents a region with a variable number (from 3 to 5) of conserved hypothetical proteins

Orphan phages infecting Staphylococcus within the family Siphoviridae

Six siphophages infecting Staphylococcus have no clear homology to members of the previously proposed genera. For instance, regarding the peptidoglycan hydrolytic activities specific to the other genera, the orphan phages, with the exception of 2638A and StB20, lack lytic domains associated with the TMPs. Phage 2638A has a TMP containing only the peptidase_M23 domain but not a transglycosylase SLT domain. An interesting structure can be observed in S. capitis phage StB20, which encodes TMP containing the peptidoglycan hydrolytic domains (a peptidase_M23 domain and a transglycosylase SLT domain) and additionally, a truncated virion-associated peptidoglycan hydrolase bearing a CHAP domain (Table 2). Complete peptidoglycan hydrolases are present in phages TEM123 and StB27. Other phages, like StB12, encode potentially truncated PG hydrolases with a glucosaminidase domain. In phage SpaA1, no peptidoglycan lytic activities could be detected, either in the TMP or any other structural proteins.

Bacteriophage TEM123 shares characteristics with members of the proposed genus “Phietalikevirus”, such as the presence of structural proteins related to host interaction (SGNH and tail fiber protein) but lacks overall homology to these viruses (Table 2). Phage 2638A seems to be more related to the members of “3alikevirus”, sharing nuclease and helicase proteins (data not shown). Phages StB20 and StB27 may have a putative relationship to “Phietalikevirus” due to the presence of host-interaction proteins; StB27 has SGNH hydrolase and tail fiber protein, and StB20 also shows a tail fiber protein in the structural module (Table 2).

Phylogeny

Typing of microbial pathogens is often performed using multi-locus sequence typing, in which fragments of several housekeeping genes are sequenced and concatenated and phylogenetic trees are constructed [10]. In this regard, using ClustalX [11], we have aligned the complete, colinearized genomes of all phages and generated a phylogenetic tree that was visualized with FigTree (http://tree.bio.ed.ac.uk/software/figtree/). Figure 2 shows the whole-genome tree of the staphylococcal phages, which clearly reveals the three proposed genera, “3alikevirus”, “77likevirus” and “Phietalikevirus”, and the three subgroups belonging to the last of these genera being also recognizable. This tree also shows that phage TEM123 is related to the phietalikeviruses and shares over 40 % proteins with members of subgroup 2 of the genus (data not shown), but in spite of its shared protein content with ΦETA, their homology is only 28.8 % as calculated with CoreGenes. For these particular phages, the results of whole-genome phylogenetic tree building corroborate those of comparative genomic analysis, and this method can be a good tool in phage taxonomy.

Fig. 2
figure 2

Phylogenetic tree of the complete genome sequences of the staphylococcal phages, aligned with ClustalX and drawn with FigTree. Genomes that were re-oriented to start with the small terminase subunit gene have been designated as _RO

Phylogenetic analysis of some conserved proteins was also carried out. Homology between phage large terminase subunits was previously shown to be indicative of different functional classes [12]. We determined that there is a high degree of sequence identity between large terminase subunits in the genus “3alikevirus”. This protein possesses the typical domains from the terminase 1 superfamily, which are shared by 2638A, StB20, and phages belonging to “77likevirus” (Fig. 3A). Phages belonging to “Phietalikevirus” (with the exception of phage SAP-26) and the orphan phages SpaA1, StB12, StB27 and TEM123 show large terminase subunits associated with the terminase 3 superfamily. These data indicate a different packaging mechanism in phages of the different genera, which would further support the proposed clustering.

Fig. 3
figure 3

Neighbor-joining tree of (A) large terminase subunit amino acid sequences encoded by staphylococcal phages and (B) major head protein encoded by staphylococcal phages. (*) Orphan phages. S1, subgroup 1; S2, subgroup 2; S3, subgroup 3

This grouping is further supported by phylogenetic analysis of the major head proteins. In fact, it shows a clear correlation between the proposed staphylococcal genera and even a higher homology to those belonging to the subgroups of the genus “Phietalikevirus” (Fig. 3B). Similar results were observed using head and tail proteins, such as portal protein and TMP (data not shown).

Discussion

The current phage taxonomy, as laid out by the International Committee on Taxonomy of Viruses (ICTV), is based on a multitude of phenotypic and genotypic parameters, including morphology, infectivity and genome organization. Several proposals for phage classification based on sequence information have been made, sometimes with conflicting views. Rohwer and Edwards [13] built a tree based on pairwise dissimilarities between the complete bacteriophage proteomes, Pride and colleagues [14] proposed a phylogeny based on tetranucleotide usage patterns, and Lima-Mendez and colleagues [15] published a framework for a reticulate classification of phages based on gene content, which incorporates more elements of mosaicisms and horizontal exchange [1618]. More disruptive is the non-hierarchical model proposed by Lawrence and colleagues [19], which is not based in viral lineages but instead on the creation of reticulate groups that more accurately represent biological relationships among these phages. Phages could belong to several groups, and phages within a group would share a particular module or phenotypic characteristics, reflecting their mosaic nature. In this work, we confirm the validity of the CoreGenes approach to create coherent phage groups as well as the identification of individual phages with special properties. Thus, a detailed analysis of staphylococcal phages within the family Siphoviridae allowed us to propose three genera based on close genetic similarity and on the presence of specific proteins involved in DNA packaging and host interaction. In addition to the high sequence identity across the structural module [20], we have also observed gene synteny in the region comprising the DNA packaging proteins of phages grouped as “3alikevirus”. This feature was previously described for phages 3A, 47 and 42e by Kwan and colleagues [4]. Furthermore, phylogenetic analysis of both the terminase large subunit and the major head protein clearly substantiate and support the establishment of the genus “3alikevirus”. In addition, the predicted DNA packaging mechanism used by members of “3alikevirus” is mediated by cohesive ends and a specific large terminase subunit (terminase 1), and it differs from that of the members of “Phietalikevirus”. Phages belonging to the genera “3alikevirus” and “77likevirus” are likely to use a cos packaging mechanism, as has been suggested previously for phage phiIPLA35 and phage ΦSLT [21, 22]. By contrast, phietalikeviruses appear to use a pac site to package DNA into capsids, as has suggested for some phages such as phiIPLA88 and Φ11 [21, 23]. Clearly, the homology among large terminase subunits with similar enzymatic end-generating functions supported the proposed staphylococcal phage genera.

The presence of HNH superfamily (homing) nucleases seems to be specific to the “3alikevirus”. It has been proposed that the presence of endonuclease genes in phage genomes can be considered analogous to insertion of transposon elements in bacterial genomes [24, 25], which could explain their presence in phages infecting members of genera other than Staphylococcus as well as in orphan phages (StB20 and 2638A) and some 77likeviruses (data not shown). Moreover, highly conserved helicase proteins (DEAD-like helicases superfamily) found in members of “3alikevirus” are also present in phage 2638A.

Close examination of the differences between phage genera reveals two distinct arrangements in the peptidoglycan hydrolytic domains. Genomes of bacteriophages belonging to the genera “3alikevirus” and “77likevirus” encode peptidoglycan hydrolytic domains (peptidase_M23 and transglycosylase (SLT), as part of TMPs), whereas in “Phietalikevirus”, individual genes encode CHAP and glucosaminidase domains located in virion-associated peptidoglycan hydrolases. Previous analyses have revealed that phage TMPs frequently contain a soluble lytic transglycosylase domain [13, 26], and lytic transglycosylases have been proposed to be involved in phage DNA entry during early stages of infection [27, 28]. Similar catalytic domains were previously described in TMPs from several mycobacteriophages, where they play an essential role in the infection of stationary-phase cells [29]. In other phages, such as phage T5, these catalytic domains are included in a TMP that also carries fusogenic activities [30].

Phages of the proposed genus “Phietalikevirus” have peptidoglycan hydrolytic activity that is also associated with the phage particle, but as an independent protein. Virion-associated peptidoglycan hydrolases facilitate the entry of phage DNA across the bacterial cell envelope during infection [31]. They are also responsible for “lysis from without,” a phenomenon caused by some phages when adsorbed onto the host cell at very high numbers [32, 33]. Activity of these proteins against peptidoglycan was confirmed for those encoded by phages phiIPLA88 and ΦMR11 [34, 35]. Although the peptidoglycan hydrolase activity of these proteins has been demonstrated, its function in the phage lytic cycle is not clear. Likewise, PRD1 and T7 mutant phages without virion-associated peptidoglycan hydrolase activity are able to infect host cells, although the process is significantly delayed [31, 36].

The “Phietalikevirus” genus division is also supported by a more complex host interaction machinery, including a SGNH protein, a pectin lyase, a tail protein and a tail fiber protein. These proteins are probably involved in degradation of extracellular or capsular material, facilitating phage infection. SGNH proteins are involved in the hydrolysis of a wide variety of substrates, including fatty acids, aromatic esters, and amino acid derivatives [37]. Pectin lyase proteins have been described as putative hydrolases against extracellular components from bacterial biofilms [38]. They comprised two domains, including a pectin/lyase domain with a right-handed beta helical fold like those identified in carbohydrate depolymerizing enzymes [39]. These relationships suggest that these proteins might interact with the bacterial cell envelope and therefore might be related to bacterial receptor interaction.

It is interesting to highlight the structure of the orphan phages 2638A, StB12 and StB20. These phages are not clearly related to members of any of the proposed genera on the basis of DNA and protein homology. They have truncated peptidoglycan hydrolytic domains (2638A and StB12) or “duplicate” domains (StB20) that could be seen as intermediaries in exchange of these catalytic domains among phages.

Overall, our results are in accordance with previous attempts at classification of staphylococcal phages and prophages [4, 5]. The comparative nucleotide and protein sequence analyses performed have allowed us to define three novel genera in which phages grouped previously [4] have been included. Of note is the fact that the clusters proposed in the recent classification provided by Deghorain et al. [5] match, with a few exceptions, our classification in genera and subgroups, supporting its validity.