Complete genome sequence analysis of a lytic Shigella flexneri vB-SflS-ISF001 bacteriophage

Shigellosis is one of the most important acute enteric infections caused by different species of Shigella, such as Shigella flexneri. Despite the use of antibiotic therapy to reduce disease duration, this approach is becoming less effective due to the emergence of antibiotic resistance among Shigella spp. Bacteriophages have been introduced as an alternative for controlling shigellosis. However, the bacteriophages must be without any lysogenic or virulence factors, toxin coding, or antibiotic-resistant genes. In this study, the whole genome sequence of vB-SflS-ISF001, a virulent Siphoviridae bacteriophage specific for Shigella flexneri, was obtained, and a comparative genomic analysis was carried out to identify its properties and safety. vB-SflS-ISF001 genomic DNA was measured at 50,552 bp with 78 deduced open reading frames (ORFs), with 24 ORFs (30.77%) sharing similarities with proteins from the genomes of homologous phages that had been reported earlier. Genetic analysis classifies it under the genus T1virus of the subfamily Tunavirinae . Moreover, comparative genomic analysis revealed no undesirable genes in the genome of vB-SflS-ISF001, such as antibiotic resistance, virulence, lysogeny, or toxin-coding genes. The results of this investigation indicate that vB-SflS-ISF001 is a new species, and confirm its safety for the biocontrol of S. flexneri.


Introduction
Shigella flexneri is a gram-negative, rod-shaped, invasive pathogen for humans and primates that causes inflammation in colonic mucosa (Jennison and Verma, 2004), a causative agent of diarrhea that is frequently bloody. It has been reported as the main cause of endemic shigellosis in developing countries and has resulted in the annual infection of more than 2 million individuals worldwide (Niu et al., 2017).
The first line of drugs to treat shigellosis is antibiotics, but due to the occurrence of antibiotic resistance among Shigella spp., it seems that these drugs are getting less effective over time (Ye et al., 2010). To tackle such an important issue, it is very important to come up with effective new alternatives. Bacteriophage therapy is a promising approach. Bacteriophages are the most common biological entities in the world (Olszak et al., 2017); previous studies have indicated that lytic bacteriophages can control a bacterial population (Wommack and Colwell, 2000). On the other hand, phages that are known as temperate bacteriophages can transfer undesirable genes within a bacterial population, including adhesion and invasion, exotoxin production, and other types of virulence genes (Wagner and Waldor, 2002;. Previous studies have reported a number of Shigella species and Escherichia coli strains susceptible to lysogenic phages (James et al., 2001). Additionally, antigen conversion by phage in S. flexneri has been reported (Gemski et al., 1975). S. flexneri harbors various bacteriophage-mediated virulence genes on its plasmids and chromosomes (Walker and Verma, 2002). Thus, to avoid transmission of such virulence genes to the bacterial host in a lytic bacteriophage product for the biocontrol of S. flexneri, analyzing the genome sequence for such genes is absolutely essential.
vB_SF1S-ISF001, a specific phage for S. flexneri, belongs to the Siphoviridae family. It has been isolated from wastewater; its biological characteristics such as host range, host range, absorption rate, burst size, lytic activity, pH, and thermal and saline stability were reported in our previous study ). In the current study, we aimed to sequence the entire genome of the S. flexneri vB_SflS-ISF001 phage and perform a comparative genomic analysis and phylogenic analysis. Additionally, we have evaluated the safety of vB_SflS-ISF001 phage for use as a biocontrol agent by looking for any undesirable genes such as antibiotic resistance, virulence factors, or lysogeny genes.

Materials and methods
2.1. Bacterial culture S. flexneri [Persian Type Culture Collection (PTCC 1234)] was obtained from the Iranian Research Organization for Science and Technology (IROST), Tehran, Iran, and stored at −80 °C. An overnight culture was prepared by adding 50 μL of the thawed stock suspension of the bacterium to 5 mL of brain heart infusion (BHI) broth (Merck, Darmstadt, Germany), and then incubated at 37 °C for 18 h with constant shaking (220 rpm).

Bacteriophage propagation and concentration
Bacteriophage vB_SflS-ISF001  was used in this study at a primary titer of 10 10 PFU/mL. vB_SflS-ISF001 was propagated using S. flexneri (PTCC 1234) as host according to the method of Sambrook and Russell (2001). One hundred milliliters of sterile BHI broth was inoculated with 1 mL of the overnight culture of the host bacterium and incubated at 37 °C with constant shaking (220 rpm). The biomass production of the host bacterium was routinely checked until it reached an earlylog phase (OD 600nm ≈ 0.2), when it was supplemented with 200 μL of the bacteriophage suspension (10 10 PFU/ mL). The mixture was incubated again at 37 °C for 24 h with constant shaking at 100 rpm. The media was then centrifuged at 10,000 × g for 10 min at 4 °C, and the phagecontaining supernatant was filtered through 0.22 µm syringe filters (Sartorius, Bangalore, India). The phage titer was then determined using the double-layer agar method (Kropinski et al., 2009). A high-titer stock of the phage was prepared using ultracentrifugation in an ultracentrifuge at 105,000 × g, 3 h, and 4 °C (Beckman Optima L-80 XP, TYPE 45 Ti rotor; Beckman Coulter, Brea, CA, USA). The pellet was then resuspended in 1 mL of sterilized SM buffer (100 mM NaCl, 8 mM MgSO4, 2% gelatin, 50 mM Tris-HCl, pH 7.5). This high-titer phage suspension was stored at 4 °C until further use.

Phage genome extraction and the whole genome sequencing
The genomic DNA of the phage was extracted according to Sambrook and Russell (2001). To remove nonphagerelated DNA and RNA, 10 μg/mL DNase I and RNase I (Sigma, Hong Kong, China) were added to the high-titer phage suspension (750 μL) and incubated for 1 h at 37 °C. Then, 78 μL of 20% SDS and proteinase K (20 mg/mL) (Sigma, Hong Kong, China) were added to the mixture, followed by an overnight incubation at 56 °C. DNA was then precipitated by adding 150 μL of 5 M sodium chloride. Subsequently, an equal volume of phenol/ chloroform/isoamyl alcohol solution was added before centrifugation at 13,000 × g for 10 min. The aqueous phase was collected carefully and remixed with an equal volume of phenol/chloroform/isoamyl alcohol solution before centrifugation at 13,000 × g for 10 min. The aqueous phase was then transferred to a new sterile tube. The phage DNA was precipitated by adding 3 M sodium acetate (one-tenth volume of the aqueous phase) and cold pure ethanol (twice volume of the aqueous phase). The sample was mixed well and incubated overnight at -20 °C before centrifugation at 20,000 × g for 20 min. Finally, the DNA pellet was washed twice with ethanol (70%) and then resuspended in RNaseand DNase-free water (Takara, Shiga, Japan). The phage genome DNA was stored at -20 °C until sequencing. DNA libraries were prepared by DNA fragmentation, adapter ligation, and amplification, and then subjected to the whole-genome DNA sequencing with 2 × 300 bp pairedend reads, carried out by the TGS Company (Shenzhen, China) on an Illumina HiSeq. The sequencing data were assembled using default parameters with SOAPdenovo (v2.04), and the sequence was deposited in DDBJ/EMBL/ GenBank under accession number MG049919.

Comparative genomics
CoreGenes 3.5 (http://gateway.binf.gmu.edu:8080/ CoreGenes3.5/) (Turner et al., 2013) was used to find the proteins of vB_SflS-ISF001 that are similar to those of related phages. Mauve was used for the whole genome comparison at a DNA level with other related phages (Darling et al., 2004).

Phage protein analysis
Phage proteins were analyzed using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) as previously described (Ghasemi et al., 2014). The high-titer phage suspension (prepared using ultracentrifugation as described above) was mixed with the loading buffer (YEASEN, China) and heated in a boiling water bath for 10 min. Phage suspension (25-30 μL) was introduced to 12% (w/v) SDS-PAGE gel (YEASEN, China), and the separated protein bands were visualized by staining the gel with Coomassie blue G-250. A PageRuler Prestained Protein Ladder (Thermo Scientific, Waltham, MA, USA) was used as the size standard (10 to 180 kDa).

Phylogenetic analysis
The amino acid sequences of 1 structural ORF (ORF29, the major tail protein) and 1 nonstructural ORF (ORF14, the DNA primase) were selected to construct the phylogenic tree of the vB_SflS-ISF001 phage. The gene sequences of other phages belonging to different genera of Siphoviridae were obtained from GenBank. All sequences were aligned in MEGA 7.0 using MUSCLE, and then the phylogenetic tree was generated using UPGMA (unweighted pair group method with arithmetic mean) with 2000 bootstrap replications (Kumar et al., 2016). Salmonella phage vB_SPuM_SP116 (accession number: KP010413) was used as the outgroup for both analyses.

Genome characterizations
The whole genome sequencing was performed with 12,290,282 total reads (184,354,300 total bases). The sequencing data assembled using default parameters with SOAPdenovo (v2.04) showed that the dsDNA genome of vB_SflS-ISF001 phage had a 50,552 bp size (coverage > 1000×), a G + C content of 45.58%, and included LTRs of 52 bp in both ends of the genome. Bioinformatic analysis revealed that phage vB_SflS-ISF001 genome contained 78 putative ORFs (19 on the forward strand and 59 on the reverse strand) which are fairly similar to other T1virus members (Table 1). ATG was identified as the only start codon for all ORFs (Table 2). According to BLASTP searches in the GenBank database, the function of 24 ORFs (30.77%) were predicted, and the remaining ORFs (54 ORFs, 69.23%) were considered as hypothetical proteins due to their shared similarities with uncharacterized database entries (Table 2). A different range of identified ORFs from 25% (Shfl1) to 31.8% (SH6) was reported in the phages belonging to the T1 virus genus (Table 1). Among the identified ORFs and detected conserved domains of the vB_SflS-ISF001 genome, no sequences related to undesirable genes including antibiotic resistance, virulence, lysogenic mediated, or toxin coding genes were found. In addition, no tRNA-encoding sequences were found in the genome ( Figure 1 and Table 2). The predicted ORFs of phage vB_ SflS-ISF001 were divided into 4 groups according to their function ( Figure 1).

DNA packaging
Terminase complex is composed of 2 separate gene products of ORFs 41 and 42. The product of ORF41 showed 94% similarity to the putative terminase large subunit from Shigella virus Shfl1 and the protein sequence of ORF42 product shared 93% similarity (E value: 3E-114) to the putative terminase small subunit from Shigella virus Shfl1.

Bacterial cell wall lysis
The product of ORF5 showed 90% similarity (E value: 6E-36) to the putative holin of Escherichia virus T1, and the predicted protein of ORF4 showed 90% similarity (E value: 2E-101) to endolysin from Shigella phage SH6.

Comparative genomics analysis
A MegaBLAST search of the phage genome indicated that vB_SflS-ISF001 had 88%-91% sequence similarity with Shigella and Escherichia phages (Table 1). CoreGene analysis demonstrated that vB_SflS-ISF001 shared similarity to 50 proteins of other related phages (score >70), including 22 known (2 bacterial cell wall lysis, 7 DNA replication, modification, regulation protein, 11 structural, and 2 DNA packaging proteins) and 38 hypothetical proteins (Table 3). These amino acid coding sequences were not restricted to any particular region or functional group of genes and were distributed over the phage genome. Moreover, comparison of the genome sequence of phage vB_SflS-ISF001 with other members of the T1virus genus demonstrated that vB_SflS-ISF001 genome sequence, organization, and ORF orientations were generally similar to other members of the genus T1virus ( Figure 2).

Analysis of vB_SflS-ISF001 structural proteins
To further characterize vB_SflS-ISF001, the high-titer phage suspension was subjected to 12% (w/v) SDS-PAGE gel. As shown in Figure 4, at least 11 individual protein bands with molecular masses ranging from 13 to 103.7 kDa were detected. In addition, each of the bands was  attributed to one of the predicted structural proteins of phage vB_SflS-ISF001 based on their molecular weights ( Figure 4).

Discussion
Shigella is one of the most important groups of Enterobacteriaceae which cause enteric infections (Zhang et al., 2013). With the emergence of resistant strains, phage therapy has been introduced as an alternative method and a new generation of antibacterial agents. A candidate phage must be analyzed thoroughly before its use in phage therapy . Therefore, the current study aimed to perform a comparative genomic analysis and phylogenic analysis, and look for any sequences related to antibiotic resistance, bacterial virulence factor, or phage lysogeny genes. According to whole genome sequencing and bioinformatic analysis, the most and the least similarity between the ORFs of vB_SflS-ISF001 and other T1virus phages were observed in SH6 and SH2, respectively. Six out of 24 ORFs (ORFs 4,16,18,19,21,and 67), and 1 out of 24 ORFs (ORF10) of vB_SflS-ISF001 had similarity to ORFs of SH6 and SH2, respectively. In the DNA replication, modification, and regulation group of genes, the function of 7 ORFs were predicted due to their similarity to JMPW2 (1 ORF), vB_EcoS_SH2 (1 ORF), JMPW1 (1 ORF), SH6 (3 ORF), and vB_SsoS-ISF002 (1 ORF). DNA primase/ helicase, which plays a regulatory role in the bacteriophage DNA replication process, is encoded by ORF 14 (Shen et al., 2016). In the structure and morphogenesis group of genes, the function of 13 ORFs were predicted due to their similarity to JMPW2 (1 ORF), vB_EcoS_SH2 (1 ORF), JMPW1 (2 ORF), SH6 (1 ORF), T1 (3 ORF), Shfl1 (3 ORF), and ADB-2 (2 ORF). Terminases are phage-encoded endonuclease enzymes with ATPase activity that act in the headful DNA packaging process during phage assembly (Hamdi et al., 2017). This enzyme, which was classified in the DNA packaging group, is composed of 2 separate units: the small subunit (ORF41) and the large subunit (ORF42). Double-strand DNA (dsDNA) phages employ the holin-endolysin complex to destroy bacterial host cells.
In the genome of vB_SflS-ISF001, ORFs 4 (endolysin) and 5 (holin) were predicted to encode this complex. Holins are hydrophobic proteins that produce holes in the bacterial cytoplasmic membrane by oligomerization and ease the access of endolysins to the cell wall (Fernandes and São-José, 2016). In contrast, endolysins have a crucial role in cleaving the peptidoglycan (murein), the main part of the bacterial cell wall structure (Fernandes and São-José, 2016). Furthermore, the position of predicted ORFs of the lysis group was similar with those of other Siphoviridae phages (Escherichia virus T1, Escherichia phage JMPW1, Shigella phage SH6, Escherichia phage ADB-2, Shigella phage pSf-2, and Shigella virus Shfl1), which were located at the right or left end of the genome (Roberts et al., 2004;Bhensdadia et al., 2013;Jun et al., 2016;Shen et al., 2016;Hamdi et al., 2017). Among the identified ORFs and detected conserved domains of the vB_SflS-ISF001 genome, no sequences related to undesirable genes including antibiotic resistance, virulence, or lysogenic mediated or toxin-coding genes were found. Therefore, vB_SflS-ISF001 can be considered a safe agent for biocontrol applications. Additionally, as with other T1virus phages, no tRNA-encoding sequences were identified in the genome of vB_SflS-ISF001. Genomic comparison showed that the organization, orientations, and distribution of the ORFs were generally similar to those of other members of the genus T1virus. Moreover, MegaBLAST analysis and UPGMA dendrograms revealed that vB_SflS-ISF001 can be classified as a new member of the genus T1virus, subfamily Tunavirinae.
In conclusion, in the current study, genomic characteristics of Shigella flexneri phage vB_SflS-ISF001 were comparatively analyzed. Phage vB_SflS-ISF001 genome is a dsDNA (50,552 bp) with 45.58% G + C content. Seventy-eight distinct ORFs and no tRNA were predicted in the vB_SflS-ISF001 genome. Comparative genomic analysis of vB_SflS-ISF001 demonstrated that this phage could be classified as a new species in the genus T1virus of the subfamily Tunavirinae. Moreover, no undesirable genes, e.g., antibiotic resistance, virulence, lysogenic mediated genes, or toxin-coding genes, were found in the vB_SflS-ISF001 genome sequence. Phylogenetic analysis (based on major tail and DNA primase) of vB_SflS-ISF001 showed a high similarity to other T1virus species, and was further validated through genome and comparative genomic analyses, which not only constitute a much more accurate classification approach, but also a powerful methodology to investigate and certify the safety of phages for potential application as biocontrol agents. Therefore, the data suggest that vB_SflS-ISF001 can be used as a safe agent for phage therapy.