Characterization of Paenibacillus larvae bacteriophages and their genomic relationships to firmicute bacteriophages

Paenibacillus larvae is a Firmicute bacterium that causes American Foulbrood, a lethal disease in honeybees and is a major source of global agricultural losses. Although P. larvae phages were isolated prior to 2013, no full genome sequences of P. larvae bacteriophages were published or analyzed. This report includes an in-depth analysis of the structure, genomes, and relatedness of P. larvae myoviruses Abouo, Davis, Emery, Jimmer1, Jimmer2, and siphovirus phiIBB_Pl23 to each other and to other known phages. P. larvae phages Abouo, Davies, Emery, Jimmer1, and Jimmer2 are myoviruses with ~50 kbp genomes. The six P. larvae phages form three distinct groups by dotplot analysis. An annotated linear genome map of these six phages displays important identifiable genes and demonstrates the relationship between phages. Sixty phage assembly or structural protein genes and 133 regulatory or other non-structural protein genes were identifiable among the six P. larvae phages. Jimmer1, Jimmer2, and Davies formed stable lysogens resistant to superinfection by genetically similar phages. The correlation between tape measure protein gene length and phage tail length allowed identification of co-isolated phages Emery and Abouo in electron micrographs. A Phamerator database was assembled with the P. larvae phage genomes and 107 genomes of Firmicute-infecting phages, including 71 Bacillus phages. Phamerator identified conserved domains in 1,501 of 6,181 phamilies (only 24.3%) encoded by genes in the database and revealed that P. larvae phage genomes shared at least one phamily with 72 of the 107 other phages. The phamily relationship of large terminase proteins was used to indicate putative DNA packaging strategies. Analyses from CoreGenes, Phamerator, and electron micrograph measurements indicated Jimmer1, Jimmer2, Abouo and Davies were related to phages phiC2, EJ-1, KC5a, and AQ113, which are small-genome myoviruses that infect Streptococcus, Lactobacillus, and Clostridium, respectively. This paper represents the first comparison of phage genomes in the Paenibacillus genus and the first organization of P. larvae phages based on sequence and structure. This analysis provides an important contribution to the field of bacteriophage genomics by serving as a foundation on which to build an understanding of the natural predators of P. larvae.

Recent advances in DNA sequencing technology have made it possible to sequence many bacteriophage genomes. When these sequences are analyzed, putative protein functions can be determined. Other studies have used comparative genomics to organize phages into related clusters [13], correlate phage packaging mechanisms with large terminase protein sequences [14], and study gene transfer, phylogenetic relationships, and impacts on host virulence [15,16].
Comparative genomics can be accomplished using software specialized for phage genomes such as the computer program, Phamerator [17]. Phamerator incorporates available data for each genome entered into its database, such as bacterial host, annotations of genes from GenBank, and conserved domains [18]. Phamerator compares each gene product in the database to each other using BLASTP [19] and ClustalO [20], the scores of which are used to create phamilies (phams) of related gene products. Phamerator provides visual tools such as full genome comparison maps and can display the relationships between proteins within a pham using a circular diagram (pham circle). Proteins within each pham must meet or exceed user-defined cutoffs for E-values and percent identity for at least one other gene product in the pham. Strict cutoffs result in phams that indicate a shared similar function and predict phylogenetic relationships.
In 2013, six P. larvae phages were isolated. These phages were fully sequenced and their genomes published [21,22]. P. larvae siphovirus phiIBB_Pl23 was isolated in Portugal [21] and P. larvae myoviruses Abouo, Davies, Emery, Jimmer1, and Jimmer2 were isolated in Utah [22]. In this report we compare the genomes of the six fully-sequenced P. larvae phages, categorize all published P. larvae phages into three groups based on structural morphology, use Phamerator to analyze previously unexplored Paenibacillus phages, and explore genetic relationships of the P. larvae phages with 107 other phages that infect Firmicute hosts. We identify gene products with conserved domains including a putative bacteriocin, serine recombinase, and antirepressor, and investigate their conservation among P. larvae phages. Results from Phamerator and CoreGenes indicate a relationship between four P. larvae phages and four small genome myoviruses that infect Lactobacillus, Clostridium and Streptococcus. These results show that comparisons can be drawn between phages that infect a phylum and provide a basis for analyzing and comparing newly isolated phages that infect P. larvae.

Results
Bacterial identification, phage isolation, and phage sequencing Bacterial isolates were collected from spores found in honey samples. All characteristic tests for P. larvae were positive: the isolates grew on PLA plates and were catalase negative and Gram-positive. PCR products from 16S rRNA primers were sequenced using BigDye sequencing. BLAST results from nine of the ten 16S rRNA sequences showed more than 99% similarity with Paenibacillus larvae subsp. pulvifaciens strains DSM 8442 and DSM 8443 as well as the related bacteria Brevibacillus laterosporus. Of nine isolates, PL2 and PL6 were used for phage isolation. Phages Abouo, Davies, and Emery were isolated using PL6, while phages Jimmer1 and Jimmer2 were isolated using PL2.
Each P. larvae phage sample was plaque-purified at least three times, sequenced, and published [22]. Prior to genome sequencing and electron microscopy we were unaware that one phage sample still contained two different phages. Plaque purification did not successfully separate these two phages. However, assembly of the genomes revealed two clearly independent genomes that separated with ease with over 100-fold coverage of the genomes. These co-isolated phages were named Emery and Abouo. Results from sequencing and annotation of the six P. larvae phages are found in Table 1. The genomes varied in length from~40 kb to~58 kb. Most of the genes in each genome were located on the forward strand (90% ± 3%). The average G + C content for these phages is 39.48% ± 1.41%. BLAST hits for proteins within these phages included both Paenibacillus and Brevibacillus bacterial strains as well as bacteriophages that infected other Firmicutes.

Phage cross-infectivity, lysogeny, and lysogen superinfection
Plaques from the sample containing Emery and Abouo were clear while plaques from the other three phages were hazy. Phages Jimmer1 and Jimmer2 were isolated independently using PL2 and neither of these phages could infect PL6. Phages Davies and Emery/Abouo were isolated using PL6 and these phages could not infect PL2. None of these phages were able to produce plaques on lawns of B. cereus, B. subtilis, L. acidophilus, S. aureus, or S. epidermidis. Of the four phage samples, three formed stable lysogens: Jimmer1 in PL2, Jimmer2 in PL2, and Davies in PL6. Stable lysogens were identified when phages were not able to superinfect bacteria lysogenic for the same phage. The lysate containing Emery and Abouo induced lysis in the PL6 Davies lysogen. No other superinfection or induced lysis was observed in any other lysogen-phage combinations. The sample containing Emery and Abouo did not form a stable lysogen and no superinfection data were obtained.
Electron microscopy reveals myovirus structure for five P. larvae phages Electron microscopy revealed that these five P. larvae phages were myoviruses, marked by the presence of a contractile tail sheath ( Figure 1). Figure 1A shows tail structures separated from the phage capsids. These tail structures were more abundant than intact phages in all samples submitted for electron microscopy. Figure 1B, 1C, 1E, and 1F show intact phages with contracted tail sheaths, while Figure 1D shows an extended tail sheath.
Because phages Emery and Abouo were not separated, micrographs for these phages were taken from the same copper grid.

Sequence similarities between phages
Gepard dotplots of the full genome sequences for all six P. larvae phages are shown in Figure 2. Diagonal lines within the dotplots indicated that phages Abouo, Davies, Jimmer1, and Jimmer2 were very similar to each other. Host specificity was also reflected in similarities between these four phages. For instance, the PL2 phages Jimmer1 and Jimmer2 shared 99.8% average nucleotide identity and the PL6 phages Abouo and Davies shared 94.9% identity. However, the average nucleotide identity between PL2 phages Jimmer1 and Jimmer2 and PL6 phages Abouo and Davies was only 80.5%. The lack of diagonal lines in the dotplot indicated that phages Emery and phiIBB_Pl23 were very different from the other P. larvae phages examined. While the Emery and Abouo phages were found in the same sample, sequences assembled independently without conflict with over 100-fold coverage for each  genome [22]. No similarities between the sequences of Emery and Abouo were apparent in the dotplot ( Figure 2). However, a black line indicates a small section of homology between Emery and Davies that Davies does not share with Abouo. Because Emery and Abouo were coisolated and sequenced together, individual reads in these sections of Emery, Davies, and Abouo were scrutinized using Consed [23] to ensure the assemblies were correct. The fold coverage before, throughout, and after these sections of Emery, Davies, and Abouo was at least 80.

Distinguishing between phages Emery and Abouo
In addition to the separation of unique DNA sequences for Emery and Abouo in the same sample, two markedly different phages were present in electron micrographs based on measurements of capsid height and tail length. Putative tape measure protein (TMP) genes were identified in each phage genome. The TMP gene found in Emery (gp16) was 3,000 bp long, while the TMP gene in Abouo (gp20) was 2,055 bp long, showing that the TMP gene in Emery was 1.46 times longer than the TMP gene in Abouo. The positive correlation between TMP gene length and tail length [24] was used to suggest a correlation between phages Emery and Abouo in electron micrographs and their respective genome sequences. The average tail length for Emery was 162.2 nm long while the average tail length for Abouo was 113.6 nm, making the tail of phage Emery 1.43 times longer than the tail of phage Abouo (Table 2). In comparison, the TMP gene of Emery was 1.46 times longer than the TMP gene in Abouo. Based on this data, we matched the long TMP gene and tail length with Emery, and the short TMP gene and tail length with Abouo ( Figure 1).

Measurements from electron micrographs
P. larvae phages were separated into three distinct groups based on structural morphology. The first group of phages were myoviruses (icosahedral capsids and contractile tails) and included phages Abouo, Davies, Emery, Jimmer1, Jimmer2, PBL0.5 [5], and PBL2 [5]. There were two distinct groups of phages that were siphoviruses. The first siphovirus group contains phages BLA [1], PBL1 [4], and PPL1c [7]. These phages had long, noncontractile tails and elongated capsids. The second siphovirus group contained only PBL3 which had a round capsid [6]. Phage phiIBB_Pl23 was also a siphovirus [21], but could not be categorized into one of the two siphovirus groups because images or measurements were not yet available. There was no apparent correlation between the type of phage and where the phage was isolated. Measurements taken from all published electron micrographs of P. larvae phages are shown in Table 2.
Measurements for phages Abouo, Davies, Emery, Jimmer1, and Jimmer2 were taken from at least three different intact phages. Phages grouped into categories by morphology type were found to have similar structural measurements. The structures of the P. larvae myoviruses were similar in size with an average capsid height of 67.2 ± 3.2 nm and an average width of 64.1 ± 2.6 nm. The average tail length was 122.0 ± 27.3 nm and was the most variable of the measured phage features.

Frameshift in P. larvae phage Emery
Phage Emery exhibited a putative ribosomal slippage site in gp4 that encoded for a head morphogenesis protein in the SPP1 gp7 family. This frameshift was identified by the online frameshift finding tool FrameD [25]. The two products in Emery are predicted to be 82.5 kDa following ribosomal slippage and 58.9 kDa if there is no slippage. The presence of both head morphogenesis proteins in the Emery virion has not yet been verified. We were unable to detect a putative frameshift via FrameD in Bacillus phage SPP1 or any protein sequence homology using BLASTP between the morphogenesis proteins in Emery and SPP1.
P. larvae phage genomic comparison using Phamerator software A database of phage genomes related to the P. larvae phages was assembled for analysis using the phage genome comparison program Phamerator [17]. The finished Phamerator database contained a diverse set of phages that infected Firmicute bacteria including the 6 P. larvae phage genomes, 71 Bacillus phages, 1 Clostridium phage, 3 Enterococcus phages, 2 Geobacillus phages, 7 Lactobacillus phages, 6 Listeria phages, one Paenibacillus glucanolyticus phage, 15 Staphylococcus phages, and 1 Streptococcus phage (Additional file 1). Phages included in the database were selected based on BLAST hits to gene products identified during annotation of the P. larvae phage genomes. Phamerator grouped the 13,697 putative proteins annotated in the 113 phage genomes into 6,181 phamilies (phams). Only 2,233 phams (36.1%) contained two or more members. These 2,233 phams contained 9,749 (71.1%) of the 13,697 putative proteins in the database. The remaining 3,948 (28.8%) putative proteins could not be grouped with other proteins and were designated as "orphams" [17]. The Phamerator database allowed comparison of P. larvae phage genes to each other and to phages infecting other bacteria. A spreadsheet was exported from the Phamerator database to report all phage gene products in the database, the phams to which the gene products are assigned, and the conserved domains found in gene products in those phams (Additional file 2). Table 3 indicates how many putative proteins in each P. larvae phage are orphams, are shared only with P. larvae phages, or are shared with phages infecting other bacterial hosts. Putative proteins from Jimmer1 and Jimmer2 shared phams with 56 non-P. larvae phages, while Abouo shared phams with 52, Davies with 53, Emery with 24, and philBB_P123 with 57 other non-P. larvae phages. Between the six P. larvae phages, there are phams shared with 72 non-P. larvae phages of the 107 (67.3%) included in the database. Of 562 genes in the six P. larvae phages, only 114 (20.3%) encoded proteins grouped into phams with proteins from other types of phages in the database. Of the remaining genes, 300 (53.4%) encoded proteins that were grouped into phams containing only P. larvae phage proteins and 148 (26.3%) were orphams.
Pham groupings reflected the genetic relationships of P. larvae phages. The genomic sequence comparison of Jimmer1 and Jimmer2 using ClustalW identified differences in 80 bp of the 54,312 bp genomes (99.85% . Independent electron microscopy data for phages PBL2 and PPL1c were not published; instead, they were reported to be indistinguishable from PBL0.5c and PBL1, respectively. **Measurements were taken from published electron micrographs instead of reported data which were either not presented or were inaccurate. -Phage measurements were not taken. similar) [22]. All corresponding genes between Jimmer1 and Jimmer2 shared the same phams. Of the 80 P. larvae phage genes in Jimmer1 and Jimmer2, 31 were unique to these two phages and were not found in any other phages in the database. These 31 genes would be orphams if Jimmer2 were not isolated (Table 3). Phamerator identified conserved domains in at least one gene in 1,501 phams (24.3%) of the total 6,181 phams in the database. Although many P. larvae phage genes encoded proteins with significant BLAST hits, less than half of the proteins had a known function. Of all P. larvae phage putative proteins, 86% had a BLAST hit with an E-value less than 1 × 10 −4 (see Table 3), yet only 48% of the proteins returned BLAST hits listing a function. Conserved domains were identified in only 43% of the P. larvae phage putative proteins (Table 3). Phages Emery and phiIBB_Pl23 contained the most orphams, the fewest BLAST hits, and the most putative proteins with no identifiable conserved domains.

P. larvae phages share structural and regulatory genes with similar functions
Conserved domains and BLAST hits matching phage or bacterial proteins were used to assign functions to 234 gene products in the six P. larvae phages, indicating that these genes were not novel and were characteristically found in other phages. The assembly and structural proteins were grouped according to function in Table 4. The regulatory and non-structural proteins are listed in Table 5. The pham assignment for each gene is shown in parentheses. Pham numbers are specific to the Phamerator database used for this analysis.
Most of the functions listed in Tables 4 and 5 describe proteins found in more than one phage. For example, all five of the P. larvae myoviruses contained seven proteins that belonged to the same family or superfamily but not always to the same pham. The function of these proteins includes head morphogenesis, tape measure, baseplate (see Table 4), LysM peptidoglycan binding, peptidoglycan hydrolase, PBSX, and bacteriocin (see Table 5).
Few proteins with known functions were identified as putative virulence factors. BLAST results indicate that gp26 in P. larvae phage phiIBB_Pl23 is a protein that is toxic to insect larvae. No toxin genes were identified in the P. larvae myoviruses. Other host-related proteins include an ABC transporter-like protein found in P. larvae phages Abouo, Davies, Jimmer1, Jimmer2, and phiIBB_Pl23 as well as an XRE-family transcriptional regulator found in all P. larvae phages. The five myoviruses contained between five and ten of these regulators per genome compared to only two in the siphovirus phiIBB_Pl23. Abouo gp51, Jimmer1 gp58, Jimmer2 gp58 and Emery gp40, gp64, and gp65 (Table 5) are the only transcriptional regulators that share a pham with a non-P. larvae phage. All others are only found in P. larvae phages. It is not known what effects these transcriptional regulators have on the host, but they do contain a canonical helix-turnhelix (HTH) domain. Very few of the regulatory genes in these phages have known functions.
Phage genome organization and pham groupings indicate relatedness of four P. larvae phages A linear genome map of the six P. larvae phages shows that the genes in phages Jimmer1, Jimmer2, Abouo, and Davies are organized similarly (Figure 3). Identically colored genes encode products that share a pham, while white genes encode orpham gene products. There are 58 phams that each contained gene products from phages Abouo, Davies, Jimmer1, and Jimmer2. Proteins in 30 of these phams had identifiable functions based on BLAST hits and are italicized in Tables 4 and 5. Of the 58 conserved phams, 38 did not contain homologs from any other phage in the database. Of the remaining 20 phams that have homologs from other phage types, three of the most populated phams are those containing small terminase (13 members), large terminase (14 members), and portal protein (13 members). Of the 16 other phages that shared one of these phams with the four similar P. larvae myoviruses, only three phages shared all three phams: Staphylococcus phages 37, 88, and PH15. Phamily relationships of large terminase proteins indicate putative DNA packaging strategies Phage gene products must meet stringent parameters in order be grouped into a pham with other genes that encode similar proteins. Because gene products in a pham are highly similar, phylogenetic analysis indicates that these proteins will be more closely related than others with the same function. A neighbor-joining phylogenetic tree grouped large terminase proteins in the Phamerator database by phamily ( Figure 4). Amino acid sequences of large terminase proteins can indicate the DNA packaging strategy [14]. Phages Abouo, Davies, Jimmer1, and Jimmer2 likely use headful packaging and have circularly permuted terminal repeats based on close association with the large terminases of well-characterized headful packaging phages P40 [26], and SPP1 [27] which share a pham. Phage phiIBB_Pl23 likely has 3′ cohesive ends based on close association with phage phiSLT [28]. Further analysis of experimental data indicated that no phams generated by Phamerator contained terminases belonging to phages with different packaging strategies (data not shown). The packaging strategy for phage Emery is still undetermined because its large terminase protein is an orpham.
P. larvae phages exhibit genetic and structural similarity with other small genome myoviruses.
The similar proteins were mostly structural (Table 6) and included the terminase (small subunit), portal, head morphogenesis, minor structural, tail sheath, and baseplate proteins. All of the gene products listed in the table were grouped into the same pham except for three proteins that narrowly missed the pham cutoff values and are marked by asterisks. The tape measure proteins in Abouo, Davies, Jimmer1, and Jimmer2 were somewhat similar to those found in Streptococcus phage EJ-1 (average E-value < 9 × 10 −17 , average identity = 24%) and Clostridium phage phiC2 (average E-value < 8 × 10 −17 , average identity = 22%) but were not near the pham cutoff values of 1 × 10 −50 .
Although P. larvae phage Emery contained gene products with the same functions as those listed in Table 6, the proteins were all orphams. However, the first five gene products encoded in P. larvae phage phiIBB_Pl23 (small terminase, large terminase, portal protein, protease, and major capsid proteins) all shared a pham with similar proteins from five siphovirus Staphylococcus phages (3A, 47, phi12, phiSLT, and tp310-2).
The program CoreGenes 3.5 was used to further compare the genes in the P. larvae phages with small-genome myoviruses. Using the default BLASTP threshold of 75, core proteins were identified in the five P. larvae myoviruses with respect to Clostridium phage phiCD119, Streptococcus phage EJ-1, Lactobacillus phage KC5a, and Lactobacillus phage AQ113. The number of core proteins shared between comparison and reference genomes are listed in Table 7. The percent of core proteins with respect to the reference genome are also reported. Clostridium  phage phiCD119 was the only one of these phages that belonged to a genus (phiCD119likevirus); the other three are currently unclassified. Previous analyses of the Podoviridae and Myoviridae families grouped phages together when phages share 40% of core proteins with a reference phage genome [36,37]. Based on this cutoff Figure 4 Neighbor-joining phylogenetic tree of the large terminase gene products from the Phamerator database indicate proposed packaging strategies. Colored boxes indicate proteins belonging to the same pham. Proteins that are not highlighted are orphams. Large terminase proteins grouped into similar phams are closely related on the tree and share a packaging strategy. *Experimentally determined headful packaging, circularly permuted terminal repeats [26,27], **Experimentally determined 3′ cohesive ends [28]. ***Experimentally determined, long direct terminal repeats [29].
value, Abouo, Davies, Jimmer1 and Jimmer2 formed a new group of small genome myoviruses.
Only one pham includes all five myoviruses, and very few phams are shared between unrelated P. larvae phages Phages Emery and phiIBB_Pl23 are significantly different from each other and from the four similar P. larvae myoviruses, as is evident from the genome maps in Figure 3. However, Tables 4 and 5 demonstrate that some proteins encoded by these phages grouped into similar phams.
Pham 34 is the only pham in the database that included proteins from all five of the new myovirus P. larvae phages. These gene products are Abouo gp34, Davies gp34, Emery gp29, Jimmer1 gp36, and Jimmer2 gp36 and encode a bhlA/bacteriocin protein (Figure 3). No other gene products in the Phamerator database were grouped into this pham. The conserved domain in these proteins was DUF2762, a putative holin-like protein. When comparing amino acid sequences, these five proteins shared > 87% identity and an E-value less than 5 × 10 −43 .
P. larvae siphovirus phiIBB_Pl23 contained only two proteins that shared a pham with any new myovirus P. larvae phages. The conserved domains in one gene product suggest it encodes a serine recombinase protein (Jimmer1 gp49, Jimmer2 gp49, phiIBB_Pl23 gp33) (Figure 3). The conserved domains in the other gene product suggest it encodes a phage antirepressor protein (Jimmer1 gp19, Jimmer2 gp19, phiIBB_Pl23 gp42) (Figure 3). Antirepressors from 13 other phages are also assigned to this pham (pham 951 in this database), including an antirepressor from Paenibacillus glucanolyticus phage PG1 gp28. A phamily circle links the 16 phages in the database containing a gene product in pham 951 ( Figure 5).
The siphovirus Paenibacillus glucanolyticus phage PG1 also contained three gene products that shared a pham with P. larvae phages Abouo, Davies, Jimmer1, and Jimmer2. Pham 90 encoded an RNA polymerase sigma-70 factor and had 5 members: PG1 gp62, Abouo gp91, Davies gp91, Jimmer1 gp99, Jimmer2 gp99. Pham 75 encoded a No genes in Emery were grouped into any of the phams listed in this table. Genes followed by an asterisk were not grouped into the same phams as the P. larvae phages using Phamerator. The average E-value and percent identity when compared to Abouo, Davies, Jimmer1, and Jimmer2 is reported below: *KC5a gp37 = 1 × 10 −46 , 31% identity. **EJ-1 gp41 = 1 × 10 −39 , 31% identity. ***EJ-1 gp46 = 8 × 10 −14 , 32% identity. Four P. larvae phages contain duplicated genes Phages Abouo and phiIBB_Pl23 did not contain any proteins that belonged to the same pham. However, gp52 and gp56 in phages Jimmer1 and Jimmer2 shared 52.1% identity (E-value is 1.39 × 10 −36 ), belonged to pham 50, and encode an XRE family transcriptional regulator that contains a helix-turn-helix DNA binding domain. Additionally, gp62 and gp63 in phages Jimmer1 and Jimmer2 shared 40.3% amino acid identity (E-value is 8.32 × 10 −11 ), belonged to pham 53, and contain an arc-like DNAbinding domain. Davies gp44 and gp45 and Emery gp53 and gp54 belonged to pham 44 and encoded a putative membrane protein ( Figure 6). Comparisons indicated that homologous proteins encoded on the two genomes were more similar than duplicated proteins encoded within one of the genomes. Davies gp44 and Emery gp53 shared 80.1% identity (E-value is 4.69 × 10 −107 ), and Davies gp45 and Emery gp54 shared 82.9% identity (E-value is 3.20 × 10 −117 ). However, Davies gp44 and gp45 shared 31.4% (E-value is 7.29 x 10 −28 ) and Emery gp53 and gp54 shared 35.4% identity (E-value is 3.39 × 10 −34 ). Abouo gp44 also belongs to pham 44 but the nucleotide sequence for this gene is different from the genes encoding the four gene products in Emery and Davies. Abouo gp44-46, Davies gp44-47, and Emery gp53-56 are identified in Figure 2A and 2B by the dark line indicating homology between Emery and Davies and the white gap between Abouo and Davies at the same location.

Discussion
Prior to this report, nine P. larvae phages were described but were never analyzed collectively or grouped based on similar characteristics. Structural and morphological characteristics are the only published information for grouping the reported P. larvae phages to date. Therefore, for general comparison, P. larvae phages were identified as myovirus, elongated-capsid siphovirus, round-capsid siphovirus, or unknown siphovirus. The five P. larvae myoviruses characterized in this paper are structurally similar to previously isolated P. larvae myoviruses and may also be genetically similar. Since few phages infecting P. larvae have been sequenced, it is useful to compare structural similarity observed in electron micrographs. Now that sequencing data has been published for six P. larvae phage genomes and sequencing of others is sure to follow, genomic grouping will prevail and clusters will likely emerge as occurred with the mycobacteriophages [13].
The five myoviruses were isolated from three soil samples each from a separate location: Jimmer1 and Jimmer2 were isolated independently from the same sample [22], Emery and Abouo were isolated together, and Davies was isolated separately. P. larvae phage PBL2 was isolated from a different sample than BL2, yet all tests indicated no obvious structural or genetic differences between these phages [5]. Similar host properties and selective pressures can result in isolation of similar phages from different locations [13]. More P. larvae siphoviruses need to be sequenced before further correlations between genome and structural morphology can be drawn. As demonstrated in this work and by others, the sequence of the tape measure protein gene may be used to identify individual phages being studied if co-isolation occurs again in the future [38].
Bacteriophages are often unable to superinfect an existing lysogen if the entering and lysogenic phages are genetically similar [39]. The portion of the genome responsible for superinfection immunity has been determined for some phages [40]. Repressor genes involved in superinfection immunity have been characterized and are known to defend the prophage from premature lysis by silencing genes related to lysis [41]. This system does not work against phages that are always lytic or temperate phages that are not sensitive to the prophage repressor genes.
Lysogens of Jimmer1, Jimmer2, and Davies displayed superinfection immunity when incubated with the same phage. Jimmer1 and Jimmer2 exhibited nearly identical sequences and were also immune to superinfections of each other.
The host specificity and correlating genome similarity between Jimmer1 and Jimmer2 (infect PL2) and Davies and Abouo (infect PL6) reflect common evolutionary ancestry. The high degree of similarity (over 80% average nucleotide identity) between the four phages may indicate that these phages infect a common host that has not yet been isolated or tested or that two phages recently switched hosts as is common in phages [42]. Jimmer1, Jimmer2, Davies, and Abouo likely coevolved.
Many bacteriophages contain genes that affect the virulence of the bacterial hosts. One toxin gene has been identified in phiIBB_Pl23 (gp26), and no toxin genes have been identified in the five P. larvae myoviruses. P. larvae phages Abouo, Davies, Jimmer1, Jimmer2, and phiIBB_Pl23 encode an ABC transporter-like protein. This was characterized as an extracellular protein produced by P. larvae [43], but it is not known how this protein is involved in host virulence. Future experiments involving the many putative XRE transcriptional regulators encoded by these phages may show a correlation with the virulence of P. larvae. Most of the transcriptional regulators found in the six P. larvae phages do not share phams with phages that infect any other bacterial host, indicating that these regulators are both phage-and host-specific. Two of these transcriptional regulators were duplicated in Jimmer1 and Jimmer2. The differences between these genes indicate that they are Figure 6 Davies and Emery share a duplicated gene while Abouo has only one copy. This genome map shows two gene products in both Davies and Emery that belong to pham 44 and encode a putative membrane protein. Davies gp44-47 and Emery gp53-56 are more similar to each other than they are to Abouo gp44-46. Gene product numbers are located inside the colored boxes. The numbers above each gene product indicate the pham number specific to this analysis and the (number of members in the pham). Gene products with the same color share a pham. ohnologous and arose by gene duplication and subsequent divergence [44]. The duplicated genes in Emery and Davies are putative membrane proteins and likely evolved in a similar fashion. Abouo contains only one copy of this gene ( Figure 6). BLAST hits for all six of the sequenced P. larvae phages show similarity to many proteins encoded by Paenibacillus and Brevibacillus bacteria. BLAST hits to these bacteria are not surprising because the genera Paenibacillus and Brevibacillus both belong to the family Paenibacillaceae and are closely related [45].
Analysis of large terminase protein phamilies revealed that Abouo, Davies, Jimmer1, and Jimmer2 likely use the headful packaging mechanism, while phiIBB_Pl23 likely has 3′ cohesive ends. Because of the stringent cutoff values required for inclusion in a pham, these results identify one way experimentally determined properties of a protein can be inferred on others sharing the same phamily.
Several gene products in P. larvae phages have similar functions but do not share phamilies. These include head morphogenesis, tape measure, baseplate, LysM peptidoglycan binding, peptidoglycan hydrolase, PBSX, and bacteriocin proteins. The conserved genes either diverged a long time ago or were acquired via convergent evolution. Additionally, the antirepressor protein in P. larvae phages phiIBB_Pl23, Jimmer1, Jimmer2 shares a pham with antirepressors from 13 other myoviruses and siphoviruses that infect host bacteria in the genera Bacillus, Enterococcus, Geobacillus, Lactobacillus, Listeria, Paenibacillus, and Staphylococcus ( Figure 5). The presence of a similar antirepressor among phages of diverse Firmicute hosts may indicate the usefulness of the gene products and their associated conserved domains to regulate production of phage proteins within a diverse set of host bacteria. These data indicate that P. larvae phages have been subjected to multiple evolutionary pressures.
The head morphogenesis protein in phage Emery belongs to the SPP1 gp7 family and contains a ribosomal slippage site that is not found in Bacillus phage SPP1. Although two gene products are produced by the head morphogenesis gene in SPP1 that are 34 kDa and 28 kDa (compared to predicted proteins of 82.5 kDa and 58.9 kDa in Emery), the two SPP1 proteins are thought to be due to an alternative start site, not a frameshift caused by ribosomal slippage [46]. The lack of homology between protein sequences indicates these proteins further illustrates that Emery is not closely related to any other known bacteriophages.
Most of the putative encoded proteins in the P. larvae phages are not grouped into phams containing proteins from other phage types. These data indicate that most P. larvae phage genes are novel among currently identified genes of phages or bacteria. More than half of the P. larvae phage proteins have no identified conserved domains or putative functions, illustrating the diversity of bacteriophages and the vast number of unknown genes yet to be explored.
CoreGenes was previously used to verify current taxonomic relationships between phages in the Podoviridae [36] and Myoviridae families [37]. It was also used analyze other "dwarf" myoviruses and group them based on the similarity of core genes [47]. Analysis of core genes and shared phams indicates that P. larvae phages Abouo, Davies, Jimmer1, and Jimmer2 are distantly related to phages in the phiCD119likevirus family as well as phages EJ-1, KC5a, and AQ113. Because proteins grouped into similar phams are phylogenetically related, these proteins likely share a common ancestry. The structural similarities between phages Abouo, Davies, Jimmer1, Jimmer2, phiC2, KC5a, AQ113, and EJ-1 may correlate with their genetic similarities because the conserved core genes include the structural module of each genome. However, the current accepted threshold of 40% for a sufficiently strong CoreGenes percentage prevents any of these phages from being grouped taxonomically (except perhaps KC5a and AQ113, which is not within the scope of this paper). The differences in genome lengths may also prevent the formation of a taxonomic family of these phages as CoreGenes reflects the percentage based on the number of genes, which means that genome length differences and subsequent differences in total gene numbers within a genome can influence the score.
The results of the CoreGenes analysis indicate that P. larvae phages Abouo, Davies, Jimmer1, and Jimmer2 are related phylogenetically. They are also distantly related to phiC2, KC5a, AQ113, and EJ-1 which infect other bacterial hosts. This relationship indicates that these four phages are the closest known phylogenetic relatives to these four P. larvae phages. The conservation of primarily structural genes among the eight small genome myoviruses may indicate that the phages adapted to maintain infectivity as their bacterial hosts diverged, but retained ancestral structural genes that were under less selective pressure.
Conserved genes between different phages may indicate important genes. The bacterial hosts PL2 and PL6 are similar (according to the 16S rRNA sequences and physical properties), and similar BhlA/bacteriocin genes such as found in the shared pham of Jimmer1, Jimmer2, Emery, Abouo, and Davies (pham 34) can likely be used to lyse the bacterial host. It is interesting to note that the only two genes shared between phiIBB_Pl23 and any other P. larvae phage encode a serine recombinase and an antirepressor, shared with the PL2 phages Jimmer1 and Jimmer2. This correlation may indicate similar host interactions, as these genes help regulate the lytic and lysogenic cycles. The PL6 phages do not contain any antirepressor gene products belonging to this pham. Although P. larvae phages Emery and phiIBB_Pl23 do not show significant genetic relatedness to any other sequenced phages, similar genes and phages will likely be discovered in the future. The six newly sequenced genomes of the P. larvae phages compared in this report are an initial foundation for future studies.

Conclusions
This first comparison of P. larvae phage genomes provides insight into the genus Paenibacillus and the important honeybee bacterial pathogen, P. larvae. Although six P. larvae phages show some relatedness to phages that infect other Firmicute bacteria, most P. larvae phage genes do not share phams with non-P. larvae phages and many gene products still have unknown functions. Efforts to characterize these gene products and to isolate, sequence, and analyze new P. larvae phages will help us better understand the genetics of these phages and their bacterial host.

Identification of field isolates
Paenibacillus larvae spores were extracted from local honey samples using the process described by Hornitzky [48]. Pelleted spores were streaked on PLA [49] plates that contained nalidixic and pipemidic acid and plates were incubated for 48-72 hours at 37°C. Colonies were streaked to purity on PLA plates. Isolates were tested with hydrogen peroxide for the presence of the catalase enzyme [50] and were tested by gram stain [45,51].
A single colony from each bacterial field isolate was boiled at 98°C for five minutes, and 3 μL of the lysate was used as a PCR template. The 16S rRNA gene region was amplified using universal primers 27 F and 907R [52], and the standard protocol for Taq DNA polymerase (New England Biolabs). Following PCR, amplicon size was checked by agarose gel electrophoresis. Samples producing a~1 kb band were submitted for BigDye (Applied Biosystems, Life Technologies) sequencing to the BYU DNA Sequencing Center. Resulting 16S sequences were analyzed using BLAST [19].

Superinfection of P. larvae lysogens with phage
Phages described by Sheflo et al. [22] were used in lysogenic superinfection studies using a protocol adapted from [53]. Some agar was removed from the center of an isolated plaque, streaked out on an LB plate, and incubated at 37°C for 24 hours to allow any lysogens to grow. One colony was removed, incubated at 37°C in 1 ml of LB broth for two hours, and then plated using the method described above. When the top agar was solid, 5 μL of each phage lysate was placed on the plate. The plate was incubated agar side down at 37°C for 24 hours. Clearing under the spots indicated superinfection had occurred, while no clearing indicated that the lysogenic bacteria were immune to superinfection.
Electron microscopy of P. larvae phages Electron microscopy was performed at Brigham Young University in the Life Sciences Microscopy Lab using an FEI Tecnai 12 Spirit transmission electron microscope. To prepare the samples for imaging, 20 μl of high-titer phage lysate was placed on a 200-mesh copper carbon type-B electron microscope grid for one minute. The lysate was wicked away and the grids were stained for two minutes using 2% phosphotungstic acid (pH = 7). Residual liquid was wicked away and the grid was allowed to dry before being imaged. Phage structures in electron micrographs were measured using ImageJ [54]. The average and standard deviation for each measurement was calculated from a minimum of three separate measurements.

Genomic comparison of sequenced phages
The DNA sequences for the six sequenced P. larvae phages were downloaded from GenBank using reported accession numbers [21,22]. Dotplots of nucleic acid and protein sequences were generated using Gepard [55] and then compared. ClustalW [56] was used to calculate Average Nucleotide Identity (ANI) percentages comparing each of the P. larvae phage genomes. The online tool FrameD [25] was used to search for frameshift mutations. Core genes were identified using the program CoreGenes 3.5 [57,58] with the default BLASTP threshold of 75.
Phages genes were analyzed using Phamerator [17], an open-source program (GNU general public license) designed to compare phage genes and genomes. For this study, Phamerator was adapted and stored in a GitHub repository (http://github.com/byuphamerator/phamerator-dev) separate from the original version. Phamerator uses BLASTP [19] and ClustalO [20] to compare each protein encoded by the genes in the database. E-values and percent identity scores are used to sort proteins into groups referred to as phamilies (phams) based on userdefined cutoffs for each score. Conserved domains in each protein are then identified. The Phamerator database used in this study was populated with 71 Bacillus phages, one Clostridium phage, 3 Enterococcus phages, 2 Geobacillus phages, 7 Lactobacillus phages, 6 Listeria phages, 6 P. larvae phages, one Paenibacillus glucanolyticus phage, 15 Staphylococcus phages, and one Streptococcus phage. The non-Bacillus phages were included in the database because proteins from these phages appeared in low E-value (<0.0001) BLAST hits for P. larvae phage proteins. In this Phamerator database, genes with E-values smaller than 1 × 10 −50 or greater than 32.5% identity with at least one other protein were