Genomic analysis of 40 prophages located in the genomes of 16 carbapenemase-producing clinical strains of Klebsiella pneumoniae

Klebsiella pneumoniae is the clinically most important species within the genus Klebsiella and, as a result of the continuous emergence of multi-drug resistant (MDR) strains, the cause of severe nosocomial infections. The decline in the effectiveness of antibiotic treatments for infections caused by MDR bacteria has generated particular interest in the study of bacteriophages. In this study, we characterized a total of 40 temperate bacteriophages (prophages) with a genome range of 11.454–84.199 kb, predicted from 16 carbapenemase-producing clinical strains of K. pneumoniae belonging to different sequence types, previously identified by multilocus sequence typing. These prophages were grouped into the three families in the order Caudovirales (27 prophages belonging to the family Myoviridae, 10 prophages belonging to the family Siphoviridae and 3 prophages belonging to the family Podoviridae). Genomic comparison of the 40 prophage genomes led to the identification of four prophages isolated from different strains and of genome sizes of around 33.3, 36.1, 39.6 and 42.6 kb. These prophages showed sequence similarities (query cover >90 %, identity >99.9 %) with international Microbe Versus Phage (MVP) (http://mvp.medgenius.info/home) clusters 4762, 4901, 3499 and 4280, respectively. Phylogenetic analysis revealed the evolutionary proximity among the members of the four groups of the most frequently identified prophages in the bacterial genomes studied (33.3, 36.1, 39.6 and 42.6 kb), with bootstrap values of 100 %. This allowed the prophages to be classified into three clusters: A, B and C. Interestingly, these temperate bacteriophages did not infect the highest number of strains as indicated by a host-range assay, these results could be explained by the development of superinfection exclusion mechanisms. In addition, bioinformatic analysis of the 40 identified prophages revealed the presence of 2363 proteins. In total, 59.7 % of the proteins identified had a predicted function, mainly involving viral structure, transcription, replication and regulation (lysogenic/lysis). Interestingly, some proteins had putative functions associated with bacterial virulence (toxin expression and efflux pump regulators), phage defence profiles such as toxin–antitoxin modules, an anti-CRISPR/Cas9 protein, TerB protein (from terZABCDE operon) and methyltransferase proteins.


Abstract
Klebsiella pneumoniae is the clinically most important species within the genus Klebsiella and, as a result of the continuous emergence of multi-drug resistant (MDR) strains, the cause of severe nosocomial infections. The decline in the effectiveness of antibiotic treatments for infections caused by MDR bacteria has generated particular interest in the study of bacteriophages. In this study, we characterized a total of 40 temperate bacteriophages (prophages) with a genome range of 11.454-84.199 kb, predicted from 16 carbapenemase-producing clinical strains of K. pneumoniae belonging to different sequence types, previously identified by multilocus sequence typing. These prophages were grouped into the three families in the order Caudovirales (27 prophages belonging to the family Myoviridae, 10 prophages belonging to the family Siphoviridae and 3 prophages belonging to the family Podoviridae). Genomic comparison of the 40 prophage genomes led to the identification of four prophages isolated from different strains and of genome sizes of around 33.3, 36.1, 39.6 and 42.6 kb. These prophages showed sequence similarities (query cover >90 %, identity >99.9 %) with international Microbe Versus Phage (MVP) (http:// mvp. medgenius. info/ home) clusters 4762, 4901, 3499 and 4280, respectively. Phylogenetic analysis revealed the evolutionary proximity among the members of the four groups of the most frequently identified prophages in the bacterial genomes studied (33.3, 36.1, 39.6 and 42.6 kb), with bootstrap values of 100 %. This allowed the prophages to be classified into three clusters: A, B and C. Interestingly, these temperate bacteriophages did not infect the highest number of strains as indicated by a host-range assay, these results could be explained by the development of superinfection exclusion mechanisms. In addition, bioinformatic analysis of the 40 identified prophages revealed the presence of 2363 proteins. In total, 59.7 % of the proteins identified had a predicted function, mainly involving viral structure, transcription, replication and regulation (lysogenic/lysis). Interestingly, some proteins had putative functions associated with bacterial virulence (toxin expression and efflux pump regulators), phage defence profiles such as toxin-antitoxin modules, an anti-CRISPR/Cas9 protein, TerB protein (from terZABCDE operon) and methyltransferase proteins.

DATA SummARy
The genome sequences of the temperate bacteriophages

INTRODuCTION
Bacteriophages, i.e. viruses that infect bacteria, are the most abundant biological entities on Earth [1][2][3]. They are found in all environmental niches colonized by bacteria, with an estimated global population of 10 31 viral particles [4,5]. Bacteriophages have very diverse genomes and have been suggested to represent the largest source of gene diversity in the environment, as highlighted by the large number of novel genes of unknown function revealed by genome and metavirome sequencing [6]. Detailed comparative analysis has been made of the bacteriophages that infect various hosts and environments, including Mycobacterium [7], Acinetobacter [8], Pseudomonas [9], Bacillus [10], Lactococcus [11], marine cyanobacterium [12], Salmonella [13] and Vibrio species [14] and also members of the family Enterobacteriaceae [15]. However, the analysis of the families of all sequenced bacteriophages has illustrated how little of the global phage population has been genomically sampled [16,17]. With an almost endless supply of diverse phages readily accessible for isolation and analysis, research programmes will continue to play substantial roles in analysing and describing functional proteins in the phages to understand their ecology and the various clinical, industrial and biotechnological applications [18].
Genomic comparative analysis has highlighted the mosaicism present in the genomes of all the dsDNA tailed bacteriophages. The term mosaicism refers to the fact that bacteriophages harbour in their genomes different regions with different evolutionary history due to the phenomenon of horizontal gene transfer (HGT) [16,17]. The phylogeny and evolutionary relationships between bacteriophages isolated from bacteria

Impact Statement
Klebsiella pneumoniae is a successful multi-drug resistant human pathogen and an important source of hospital infections associated with high morbidity and mortality due to multiple factors. Temperate bacteriophages (prophages), located in the genome of clinical strains of K. pneumoniae, act as mobile elements of horizontal gene transfer, being able to modulate the behaviour of bacteria by providing virulence and phage defence genes. Therefore, the characterization of these prophages can lead to the discovery of new molecular targets for the development of innovative antibacterial treatments.  [19,20], confirming their importance. The phenomenon of HGT, as well as bacteriophages themselves, act as drivers of the evolution and diversification of bacteria, allowing them to acquire virulence factors (e.g. diphtheria or botulinum toxins) and/or genes related to metabolism, antibiotic resistance (e.g. β-lactamases) and adaptation to new environmental niches [21][22][23].
Klebsiella pneumoniae, which belongs to the family Enterobacteriaceae, is a Gram-negative opportunistic bacterial pathogen associated with a wide range of diseases, such as urinary tract infections, pneumonia and septicaemia, as well as infection of wounds and soft tissues [24]. In recent years, different strains of this species have acquired genes for resistance to antibiotics, especially genes that encode enzymes capable of breaking down most β-lactamases [25]. This has led to the generation of multi-drug resistant (MDR) pathogens, which constitute a serious public-health problem [26]. The success of K. pneumoniae as a nosocomial pathogen is due to its intrinsic virulence, attributed to its ability to cause invasive infection via fimbrial adhesins [27] and to the presence of a thick capsule, which acts as a possible antiphagocytic factor [28], and to toxin-antitoxin (TA) systems associated with the stability of mobile elements acquired through HGT, such as vagCD genes, which encode a functional broad-spectrum TA system and are conserved on the large multiple-antibiotic-resistance-conferring plasmids in this species [29].
In this study, we analysed the genomic characteristics of 40 genomes of prophages identified in 16 clinical isolates of carbapenem-producing K. pneumoniae. Moreover, we used comparative bioinformatic tools and microscopic techniques to investigate the phylogeny and evolutionary relationships of the prophages.  to the methods proposed by the Pasteur Institute (https:// bigsdb. pasteur. fr/ klebsiella/ klebsiella. html) [30]. The study highlighted that K. pneumoniae ST258 and ST15 are high-risk clones in the worldwide spread of carbapenemases.

Identification of prophages in the genome of K. pneumoniae isolates
Prophages in the genomes of different strains of K. pneumoniae were identified with the phaster (Phage Search Tool -Enhanced Release) bioinformatics tool (http:// phaster. ca/). Only those temperate bacteriophages identified by the program as intact (score >90) were used for analysis in the present study.
The comparative genomic analysis of the four K. pneumoniae prophages found in different isolates of K. pneumoniae was carried out with the blast Ring Image Generator (brig) program [31], which uses the sequence similarity between the bacteriophage regions and the sequences of the assembled bacterial genome. Focusing on bacteriophage clusters of the Microbe Versus Phage (MVP) database (http:// mvp. medgenius. info) in K. pneumoniae strains, we searched for possible homologies between these groups and the prophages under study. For this purpose, we used the Position-Specific Iterative Basic Local Alignment Search Tool (psi-blast) to analyse the complete prophage genome against the representative sequence (i.e. the longest) of the cluster and applied the following cut off values: query coverage >60 %, identity >89 %.
The MVP database can be used to study phage-host relationships [32], as it provides the user with an important list of interactions in addition to international clusters (sets of viral sequences grouped according to their sequence similarities). We searched the database for possible sequence homologies between the 40 prophages under study and the international clusters. We observed homology with 25

Integration of prophages
In the final step of the study, the integration sites of the prophages were identified with the Standard Nucleotide Basic Local Alignment Search Tool (blastn) by searching for nucleotide sequence similarity between the K. pneumoniae prophages and all sequences in the National Center for Biotechnology Information (NCBI) database. Groups of proteins located before the integrase proteins were considered integration sites.

Phylogenic relationships among the 40 temperate bacteriophages predicted to belong to the order Caudovirales
The dot plot alignment of the nucleotide sequences of the 40 temperate bacteriophages was inferred using Genome Pair Rapid Dotter (gepard) [33], and a fasta file was constructed with all the sequences. Bacteriophages lack a universal marker gene for phylogenetic analysis; however, the use of the terminase large subunit of the tail protein or the major capsid protein is often reported [34]. In the present study, a phylogenetic analysis of the sequences of the terminase large subunit was performed with the ClustalW program (http:// www. ebi. ac. uk/ clustalw/) in mega x software [35]. A tree was generated by multiple alignments, with the neighbour-joining method. The following parameters were applied to produce the tree: (i) the number of differences was established; (ii) gaps/missing data were treated as complete deletions; (iii) the bootstrap consensus tree was inferred from 1000 replicates [36]; and (iv) the condensed tree was displayed with a value of 100 %.

Isolation of the bacteriophages and transmission electron microscopy (TEm) studies
Mitomycin C was used to induce bacteriophage production following the protocol described by López et al. [8].  [37]. Finally, the samples were stored at 4 °C until processed for TEM on a JEOL JEM-1011 electron microscope.

Induced bacteriophage mix spot test
Spot tests were carried out using the mixed induced samples of bacteriophages obtained from the purified PEG preparation used for TEM. All of them contained representative members of each cluster (A, B, C, D and E). The spot test method was a modified version of the protocol described by Raya and Hebert [38]. When the OD 600 reached 0.6 nm, 200 µl culture from the 16 K. pneumoniae isolates was mixed with 4.5 ml soft agar (0.5 % NaCl, 1 % tryptone and 0.4 % agar) and poured onto TA agar plates (0.5 % NaCl, 1 % tryptone and 1.5 % agar). Once the soft medium had solidified, 15 µl bacteriophage mix solutions were added to the TA agar plates. A negative control consisting of SM buffer was included for each plate.

ORF annotation
The putative functions of the ORFs were determined by sequence identity with the Rapid Annotation by Subsystem Technology (rast) server (http:// rast. nmpdr. org/) and the blast-protein tool psi-blast developed by the NCBI (https:// blast. ncbi. nlm. nih. gov/ Blast. cgi). In addition, we used the HHpred tool in the MPI informatics toolkit (https:// toolkit. tuebingen. mpg. de/# hhpred/) [39], which predicts functions through protein structure with a model accuracy that is competitive with that of the best servers in CASP8. The cutoff E value for annotation of proteins was <10 −5 and all ORFs with E values >10 5 were annotated as hypothetical proteins (Critical Assessment of Technique for Protein Structure Prediction) [40].

Predicted prophage genomes
Whole-genome sequencing of the 16 carbapenemaseproducing clinical strains of K. pneumoniae (Table 2) (Table 3). Comparative genomic analysis of these prophages was performed with the brig program according to their sequence similarities (Figs 1-4). Comparison of the sequences with different prophage sequences and clusters from the MVP database (see Table 3 and it was also identical to cluster 4901 (which showed a host range of 57 bacterial strains) (Fig. 2). The sequence of the 39.6 kb prophage was identical to the genomes of three prophages (ST437-OXA48phi4.1, ST512-KPC3phi13.6 and ST258-KPC3phi16.1). In addition, cluster 3499 (which showed a 19 strain host range) had partial sequence identity with the three prophages (Fig. 3). Finally, the sequence of the 42.6 kb temperate bacteriophage (prophage) (ST512-KPC3phi13.1) had an identical sequence in the prophage (ST11-VIM1phi8.1) and the cluster 4280 (Fig. 4).

Integration sites of prophages
In all prophages, the attachment sites (attL and attR) were identified by phaster. Moreover, by studying the integration sites of the prophages, we observed that these were integrated at various sites, as detailed in the following text. In the case of ST15-OXA48phi14, the phage integration truncated the protein of unknown function. (vii) One prophage was integrated after a sensor domain-containing diguanylate cyclase, truncating the latter (ST16-OXA48phi5.4).

Phylogenetic study
To observe the phylogenetic relationships between the temperate bacteriophages under study and the clusters included in the MVP database, we reconstructed a neighbourjoining tree by the dot plot alignment of nucleotide sequences, constructed from a fasta file with all genomes of the 40 prophages studied with the gepard program. This enabled The evolutionary history of the large terminase subunit protein of 40 K. pneumoniae temperate bacteriophages and the representative MVP clusters were inferred using the neighbour-joining method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches. The evolutionary distances were computed using the number of differences method. All positions containing gaps and missing data were eliminated. Evolutionary analyses were conducted in mega x [35]. us to define five clusters: A, B, C, D and E (Fig. 5a). Cluster A was subdivided because it included two different groups of temperate bacteriophages (A1 and A2).
Moreover, we reconstructed a neighbour-joining tree (Fig. 5b) with a packaging protein, the terminase large subunit, in the tree with bootstrap values (100 %) [41]. The four temperate bacteriophages, which were included in different strains considered in the study (33.3, 36.1, 39.6 and 42.6 kb, together with their international clusters from MVP database, 4762, 4901, 3499 and 4280), were included in clusters and subclusters: A1, C, A2 and B, respectively (Table 3, Fig. 5b).

TEm studies
TEM images revealed the presence of the three families of bacteriophages in the order Caudovirales. Representative images of the bacteriophages in each cluster are shown in Fig. 6. The typical morphology of the different families was observed, including a long, rigid tail in the Myoviridae bacteriophages ( Fig. 6a-e), a long, flexible tail in the Siphoviridae bacteriophages (Fig. 6g, h) and a small tail in the Podoviridae bacteriophages (Fig. 6f, i).

Induced bacteriophage mix spot test
The lytic potential of induced bacteriophages mixes with one phage representative of each cluster (previously observed by TEM) was assayed by the spot test in all the isolates of K. pneumoniae (Table 4). A spot was observed in one strain for all the bacteriophage mixes almost. The induced bacteriophage mix 3 derived from the strain ST101-KPC2, which harbours ST101-KPC2phi6.1 (48 131 bp; cluster B), ST101-KPC2phi6.2 (11 454 bp; not included in any cluster) and ST101-KPC2phi6.3 (43 942 bp; not included in any cluster), was able to produce a halo with five clinical isolates. Despite a large amount of halo, it is not possible to know whether it is due to the lytic capacity of bacteriophages or due to the phenomenon of lysis from without.

Annotation of the prophages
Using rast, psi-blast and HHprep tools, we obtained functional information for 59.7 % of the prophage proteins analysed (Fig. 7, Supplementary Material).
The most abundant functional category found in the prophage genomes involved genes that constitute the minimum basic unit (universally presented genetic markers) of a Caudovirales bacteriophage, i.e. the genes related to structure and packaging, lysis, lysogenesis, transcription, replication and regulation (Fig. 7). One of the most important functions of the bacteriophage tail, and more precisely of the tail fibre and the baseplate protein, is to recognize the host surface receptors. In addition to these structural genes, the minimum unit of a bacteriophage was also composed of genes involved in packaging. The packaging machine of most bacteriophages contains two proteins called the terminase small subunit and the terminase large subunit [34]. In this study, the terminase large subunit (approx. 1700 bp) was identified and annotated in all genomes of the temperate bacteriophages; however, the same did not occur with the small subunit (approx. 600 bp), which was only identified and annotated in 28 temperate bacteriophage genomes.
We also detected lysis genes involved in the lysis of the bacterial cell, such as the holin gene and lysozyme-encoding gene, in all viral genomes [42]. However, we did not consistently find lysogenic genes, such as integrase, in all of the prophages studied. In fact, 40.47 % of the studied prophages lacked integrase. Intending to study the insertion site of all the bacteriophages included in this study, we used the blastn tool to identify the region of the bacterial genome where this integration took place. This analysis revealed that the integrase was located upstream from what phaster considered to be the initial site in those bacteriophages that appeared to lack integrase, because this program is not able to identify sequences from two different contigs. In addition, the prophage genomes were also composed of genes (ORFs) involved in the phagehost interaction: virulence factors and phage defence genes (Fig. 8) [42].

Strain
Control (SM buffer) Bacterial bacteriophage mix The presence of 32 ORFs of methyltransferase proteins [44] in 25 prophages from these K. pneumoniae clinical isolates (Table 5).

DISCuSSION
In this study, we identified and annotated the genomes of 40 prophages (mainly belonging to the order Caudovirales family Myoviridae) from 16 clinical strains of carbapenemaseproducing K. pneumoniae and determined their phylogenetic relationships. According to the available data about the size of the phage genomes, we observed diverse sizes of temperate bacteriophages, ranging from 11.445 kb (prophage ST101-KPC2phi6.2) to 84.199 kb (prophage ST13-OXA48phi12. 3) and corresponding to the size of dsDNA viruses (i.e. 18 to 500 kb) [51]. Interestingly, prophages of sizes 33.3, 36.1, 39.6 and 42.6 kb were present in several clinical strains of K. pneumoniae, possibly indicating important roles in these isolates. Moreover, these prophages had partial sequence identity with international bacteriophage clusters 4762, 4901, 3499 and 4280 (MVP database), respectively, which are the most prevalent clusters in clinical strains of K. pneumoniae. Interestingly, these temperate bacteriophages were included by phylogenetic relationships in three main clusters (A, B and C). These findings indicate the high frequency of temperate bacteriophages in clinical populations of K. pneumoniae [52,53], as well as in other clinical pathogens such as Acinetobacter baumannii [8] and Pseudomonas aeruginosa [9,54].
Interestingly, analysis of spot tests revealed that the most frequent temperate bacteriophages (33.3, 36.1, 39.6 and 42.6 kb) do not produce halos in the highest number of strains. This is probably due to the superinfection exclusion systems promoted by the prophages in the bacterial genome. Thus, the bacteria become immune to subsequent infection by other bacteriophages that are the same or very similar to the integrated prophages [55].
The integration of the bacteriophages into the bacterial genome is a crucial step in the lysogenic cycle [56]. This event is mediated by the integrase protein, a DNA recombinase encoded by bacteriophages, at a specific bacterial genome attachment site (attB), which is identical to an attachment site (attP) of the bacteriophage genome [56]. According to previous reports, there have been indications that prophages are not randomly distributed in genomes. Indeed, it has been observed that those prophages that encode tyrosine integrase are usually integrated next to a host tRNA [57]. One possible explanation for this finding is the affinity that the temperate bacteriophage may have for palindromic structures, enabling integration [57]. These data corroborate our findings, whereby 35 % of the prophages under study were integrated close to host tRNA, specifically tRNA-ARG, which coincides with the findings of Roszniowski et al. for a Burkholderia cenocepacia prophage sequence [58]. Additionally, the subsequent integration sites that we identified are also consistent with those most frequently observed by these researchers (ABC transporter genes and transcriptional regulators) [58].
Most of the genes that encode bacteriophages (50-80 %) lack a described function to date and, therefore, are currently deposited in public databases, such as the NCBI, as hypothetical proteins [59][60][61]. Thus, several studies have revealed a high percentage of hypothetical proteins in the genomes of specific bacteriophages of Salmonella [36], K. pneumoniae [62] and Escherichia coli [63]. In the present study, we predicted the function of 59.7 % of proteins, by using different bioinformatics tools. Identification of protein functions is a challenge that must be addressed to improve knowledge of bacteriophages and the level of security of their applications [36]. It should be noted that in other studies a better understanding of the bacteriophage's protein function allowed bacteriophage engineering to be undertaken. In fact, knowledge of bacteriophage regulatory proteins has allowed the conversion of a lysogenic bacteriophage into a lytic one in A. baumannii, which presented activity against various clinical strains of A. baumannii [64]. In addition, a cocktail of natural lytic bacteriophages along with engineered bacteriophages has recently been successfully used to treat an infection caused by a drug-resistant Mycobacterium abscessus strain in a patient with cystic fibrosis [65].
We subsequently observed that the next most abundant genes in the prophage genomes were those that constitute the minimum unit of a Caudovirales bacteriophage, i.e. genes related to structure, packaging, lysis, lysogenesis, transcription, replication and regulation [66,67]. In the present study, we predicted virulence factors belonging to secretion systems such as T4SS (type IV secretion system) (involved in bacterial competence) [68], viral coat proteins [69] and the MarR family regulator of efflux pumps (which are associated with virulence factors in many bacteria [46]). For example, SlyA regulates Salmonella pathogenicity island-2 genes and contributes to resistance to oxidative stress, bacterial survival within macrophages and also bacterial survival in infection models. In another example, RovA is a member of the MarR/ SlyA family, which regulates expression of inv (an adhesion and invasion factor) in the enteric pathogens Yersinia enterocolitica and Yersinia pseudotuberculosis, as well as expression of the psa locus of Yersinia pestis, the causative agent of bubonic and pneumonic plague. Members of this transcriptional regulator family are also thought to be important for the adaptation of K. pneumoniae in the mammalian host [70].
Interestingly, several mechanisms associated with phage defence against other viruses or bacteria have been detected. We must highlight the prediction of modules of TA systems in the genome of some prophages from this study. TA systems are small modules consisting of a stable toxin and its unstable cognate antitoxin [71]. These systems are involved in abortive infection (bacteriophage immunity through altruistic suicide), which not only protects bacteria from bacteriophage infection in the cultures grown, but also affects the use of bacteriophage therapy [48]. The use of bioinformatics tools has led to the detection of four TA modules. Two of these modules displayed partial sequence identity with RelBE-like proteins, and the other two displayed partial sequence identity with HigBA TA proteins [47,48].
We also detected a putative anti-CRISPR/Cas9 protein, AcrIIC3. The CRISPR/Cas system is known to play an important role in protecting bacteria against invasion by bacteriophages and other mobile genetic elements. Nevertheless, bacteriophages have evolved different means of evading CRISPR/Cas defence systems, such as point mutations and the actions of small proteins that directly interact with the CRISPR/Cas system and shut it down. The first anti-CRISPR protein was discovered in 2013, in P. aeruginosa prophages [72,73], and since then multiple anti-CRISPR/Cas proteins have been discovered in bacteriophages of various bacterial species, e.g. Moraxella bovoculi, Sulfolobus islandicus, Listeria monocytogenes and Streptococcus thermophilus [74]. The anti-CRISPR/Cas9 AcrIIC3 was discovered in Neisseria meningitis and acts by inhibiting the Cas9 protein, blocking the DNA loading step through binding to a non-conserved surface of the HNH domain and interacting with the rec lobe of Cas9 leading to dimerization of the ArcII3-Cas9 complex [75,76]. We also detected a CRISPR-associated endoribonuclease Cas2 in the genomes of two prophages. Bacteriophages are thought to use such proteins to evade host immunity in an as yet uncharacterized way [77].
Also related to phage infection immunity, we found the TerB protein, despite it being involved in resistance to tellurite when it is part of terZABCDEF operon in bacterial pathogens such as E. coli and K. pneumoniae [78]. Furthermore, TerB has been found to be involved in resistance to infection by various bacteriophages, such as T1 and T5, a mechanism known as phage inhibition (Phi), as well as related with resistance to pore-forming colicins (PacB) [79]. Finally, numerous predicted methyltransferase enzymes were located in 62.5 % of the prophages from K. pneumoniae clinical strains, and were reported to be involved in protecting the viral genome against the host restriction enzymes from the bacteria [53,80].

Conclusion
This study characterized 40 prophages (order Caudovirales) in the genomes of 16 clinical strains of K. pneumoniae belonging to different STs (MLST). The analysis revealed the presence of four size groups of prophages (33.3, 36.1, 39.6 and 42.6 kb), whose sequences were similar to those of the international host-bacteriophage clusters registered in the MVP database, 4762, 4901, 3499 and 4280, respectively. Moreover, these 33.3, 36.1, 39.6 and 42.6 kb prophages were included in three main clusters (A, B and C) by phylogenetic relationships. The annotation of prophages has revealed 40.3 % of prophage genomes encoded genes of unknown function. However, this annotation has also revealed that the second most important group of genes constituted the minimum unit of Caudovirales bacteriophages. Interestingly, we observed virulence factors (compounds of secretion systems or regulators) and components of phage defence against bacteria and also against other bacteriophages (TA modules, anti-CRISPR/Cas9, TerB protein and methyltransferase proteins) in prophage genomes. Future lines of research should focus on obtaining more information about genes of unknown function to provide a better understanding of phage genomes for possible therapeutic applications of bacteriophages, such as phage-derived protein, as well as for engineering phages with activity against MDR pathogens.