Biosynthetic gene cluster signature profiles of pathogenic Gram-negative bacteria isolated from Egyptian clinical settings

ABSTRACT Biosynthetic gene clusters (BGCs) are a subset of consecutive genes present within a variety of organisms to produce specialized metabolites (SMs). These SMs are becoming a cornerstone to produce multiple medications including antibacterial and anticancer agents. Natural products (NPs) also play a pivotal role in enhancing the virulence of ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter spp.), which represent a global health threat. We aimed to sequence and computationally analyze the BGCs present in 66 strains pertaining to three different ESKAPE pathogenic species: 21 A. baumannii, 28 K. pneumoniae, and 17 P. aeruginosa strains recovered from clinical settings in Egypt. DNA was extracted using QIAamp DNA Mini kit and Illumina NextSeq 550 was used for whole-genome sequencing. The sequences were quality-filtered by fastp and assembled by Unicycler. BGCs were detected by antiSMASH, BAGEL, GECCO, and PRISM, and aligned using Clinker. The highest abundance of BGCs was detected in P. aeruginosa (590), then K. pneumoniae (146) and the least in A. baumannii strains (133). P. aeruginosa isolates shared mostly the non-ribosomal peptide synthase (NRPS) type, K. pneumoniae isolates shared the ribosomally synthesized and post-translationally modified peptide-like (RiPP-like) type, while A. baumannii isolates shared the siderophore type. Most of the isolates harbored non-ribosomal peptide (NRP) BGCs with few K. pneumoniae isolates encoding polyketide BGCs. Sactipeptides and bottromycin BGCs were the most frequently detected RiPP clusters. We hypothesize that each species’ BGC signature confers its virulence. Future experiments will link the detected clusters with their species and determine whether the encoded SMs are produced and cause their virulence. IMPORTANCE Our study analyzes the biosynthetic gene clusters (BGCs) present in 66 assemblies from clinical ESKAPE pathogen isolates pertaining to Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa strains. We report their sequencing and assembly followed by the analysis of their BGCs using several bioinformatics tools. We then focused on the most abundant BGC type in each species and we discussed their potential roles in the virulence of each species. This study is pivotal to further build on its experimental work that deciphers the role in virulence, possible antibacterial effects, and characterization of the encoded specialized metabolites (SMs). The study highlights the importance of studying the “harmful” BGCs and understanding the pathogenicity and virulence of those species, as well as possible benefits if the SMs were used as antibacterial agents. This could be the first study of its kind from Egypt and would shed light on BGCs from ESKAPE pathogens from Egypt.

T he ESKAPE term refers to a group of pathogens causing alarming infections in both developed and developing countries due to increasing multidrug resistance and virulence.They include Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumo niae, Acinetobacter baumanni, Pseudomonas aeruginosa, and Enterobacter species (1).According to the 2019 antimicrobial resistance threat report issued by the Centers for Disease Infection and Control (CDC), all members of ESKAPE pathogens are seriously threatening on the nosocomial and community levels in the USA (2).In 2017, the World Health Organization published a priority list of bacterial pathogens for which research and development should be urgently directed to develop new antibiotics.Multidrugresistant A. baumanni, P. aeruginosa, and Enterobacteriaceae are among the critical pathogens listed for new antibiotic development (3).Thus, all the ESKAPE pathogens group species can be considered superbugs that resist most of the commonly known antibiotics.In addition to being reported as multidrug resistant, ESKAPE pathogens are also reported to be hypervirulent (4).
These pathogens are developing antimicrobial resistance (AMR) against most of the last-line antibiotics such as carbapenems, extended-spectrum beta-lactams (ESBLs), vancomycin, and methicillin (2).Bacterial cells can have intrinsic resistance genes encoded on their chromosomes or can acquire extrachromosomal genes carried on mobile genetic elements such as plasmids and transposons (5).ESBLs are capable of degrading third-generation cephalosporins in addition to penicillin, first-and secondgeneration cephalosporins, and aztreonam.ESBLs are plasmid-encoded resistance genes and have different types, where the most common are sulfhydryl reagent variable (SHV), TEM, CTX-M, and oxacillinases (OXA) (6).While carbapenem is effective against both Gram-positive and Gram-negative bacteria, carbapenem resistance is a major public threat of infections caused by Gram-negative pathogens.Modifications in penicillinbinding proteins (PBPs), the increase in efflux pumps, and reduction in cell membrane permeability are among the resistance mechanisms to carbapenems (7).Methicillinresistance S. aureus is able to express a variant of PBP, PBP2a, to which methicillin has lower binding affinity (8).
The healthcare system in Egypt is currently struggling with multidrug resistance bacteria in both hospital-and community settings.The misuse and abuse of antibiotics play a key role in the uncontrolled resistance phenomenon.Individuals have easier access to antibiotics as over-the-counter medications instead of being prescribed by healthcare specialists (9).Routine infection control measures in hospitals and reporting resistance cases are cornerstones to systematically tackle AMR issues (10).More effort should be invested to routinely isolate, sequence and study highly prevalent nosocomial and community pathogens and how far or closely related they are compared to their global counterparts.Some organisms, including bacteria, fungi, and plants, possess in their genomes biosynthetic gene clusters (BGCs) that are encoding for the ultimate synthesis of specialized metabolites (SMs) (11,12).SMs confer more benefits to the producing organism such as antagonism (e.g., bacteriocins), communication (e.g., homoserine lactones), and survival in harsh environments (e.g., ectoine) (13).SMs are chemically diverse compounds comprising an array of types including polyketides, peptides, terpenes, and others (14).SMs are also known as natural products (NPs, they are encoded in the host genome and are produced by several organisms including pathogens (15,16).
In addition to the antibiotic activity of a plethora of SMs, as well as SMs of anticancer activity and their use as FDA-approved natural products later on, e.g., erythromycin and doxorubicin, respectively (17); other SMs are contributing to the virulence of their producing organism, e.g., the fungal toxin dihydroxynaphthalene melanin (18).
SMs are more commonly studied and known for their advantageous effects, e.g., as antibacterial agents.Nevertheless, SMs can also have pathogenic effects and can increase the virulence of pathogens producing SMs.Among the examples are Streptomy ces pathogenic strains that produce geldanamycin and nigericin as SMs and they were reported to have toxic effects on plants (19).It is thus of ultimate importance to study the SMs and their BGCs within pathogenic strains for better understanding and targeting of pathogenicity and virulence of such strains.Other pathogenic strains also were proved to produce SMs that cause harm to their host insects and arthropods (20).SMs were also recently investigated in P. aeruginosa clinical isolates as to their metabolites and their structures and were found to produce siderophores, rhamnolipids, quinolones, and phenazines (21).Other examples are Burkholderia pathogenic strains and SMs they produce, such as toxoflavin produced by the pathogenic Burkholderia glumae (22).It remains interesting to probe pathogenic microbes both for SM culprits and for useful SMs that could possibly be antibacterial against other pathogens (23).
To survive in a hospital setting, ESKAPE pathogens express SMs encoded by BGCs (24).These SMs include antibiotic and anti-biofilm compounds that act against certain bacteria in favor of others (24).On the clinical level, SMs may act in synergism with common AMR mechanisms such as efflux systems and thus render conventional antimicrobials ineffective (25).SMs can also act as antioxidants for reactive oxygen species (ROS), the unstable molecules that contain oxygen and cause cell damage (25).One example of an antioxidant SM is staphyloxanthin produced by S. aureus (26).Pyocyanin is a phenazine reduction oxidation SM produced by P. aeruginosa as a virulence factor in lung infections (27,28).Accordingly, SM analysis was suggested to be considered in the standard antibiotic susceptibility tests (25,29).In A. baumannii, wee BGC encodes for extracellular polysaccharide matrix.Targeting this cluster may prevent one of the highest virulence mechanisms of the bacterium, the biofilm (30).
Siderophores are SMs that help bacteria to quench the necessary iron needed for bacterial cells' growth (31).Normally, host organisms do not have freely moving iron, but the iron is rather tightly bound to proteins.Bacteria can counteract the scarcity of iron by siderophore-dependent and siderophore-independent mechanisms (32).Siderophore production may affect the biofilm formation process in ESKAPE pathogens (31).Siderophores may also interfere with antibiotic activity by modulating oxidative stress mechanisms (33).
With the low-cost and time-efficient sequencing technologies, it is becoming easier to sequence the whole genome of an organism of interest.Whole-genome sequencing has become especially useful with bacteria to mine their genomes and reveal more of their sophisticated metabolic machinery.The therapeutic and industrial potential of BGCs and SMs motivated developers in the microbial bioinformatics field to implement new tools to predict the presence and structure of potential of BGCs within the sequence of bacterial genomes.Bioinformatics and chemoinformatics analysis tools are reviewed in reference (34) in detail.
We aimed to assess the potential of the genomes of selected clinically relevant Gram-negative species pertaining to the species: K. pneumoniae, P. aeruginosa, and A. baumannii, to produce SMs by detection of their potential BGCs.We focused on the clusters that were most abundant in each of the included taxa.The aim of this study was to detect the BGCs of the selected strains and align them with BGCs of known strains.In the future, it is needed to decipher the NPs they produce and their possible roles in the virulence of the strains.

Sample collection, DNA extraction, and sequencing
An overview of the workflow is detailed in Fig. 1.A total of 66 isolates (17 P. aeruginosa, 28 K. pneumoniae, and 21 A. baumannii isolates) were recovered from different clinical specimens (blood, urine, sputum, and others).All samples were randomly collected from patients admitted to bacteriological testing at Mabaret El Asafra Labs, Alexandria, Egypt during the period between August 2020 and March 2021.Bacterial identification at the species level was carried out using the VITEK 2 Compact GN ID card (bioMérieux, Marcy-l'Étoile, France).DNA extraction was performed using QIAamp DNA Mini Kit (QIAGEN) according to the manufacturer's instructions.DNA quality and concentration were determined using a Qubit 3.0 fluorometer.The Illumina NextSeq 550 sequencing platform was used for whole-genome sequencing of the isolates.For library prepara tion, 1 μg of genomic DNA and the NEXTflex Rapid XP DNA-Seq library Preparation Kit following the manufacturer's instructions was used (PerkinElmer, https://perkinelmerappliedgenomics.com/).The libraries were sequenced using the NextSeq system by NextSeq 500/550 mid output kit v2.5 (300 cycles) paired-end kit.

Detection of BGCs
We used four bioinformatics tools for BGCs detection and analysis.The first one was antiSMASH software (v6.0.1),where detection strictness was set into relaxed (14).
Algorithms used for BGCs searching were Known Cluster Blast, Active Site Binder, SubCluster Blast, and RREFinder.Selected BGCs were as follows: non-ribosomal peptide synthase (NRPS), RiPP-like, and siderophores, as they were the most abundant BGCs found in P. aeruginosa, K. pneumoniae, and A. baumannii, respectively.Using GenBank files generated from antiSMASH, the selected BGCs in the samples and representative strains were visualized using command-line Clinker software (41) where alignment was included, and the identity was set to 90%, clusters were labeled according to gene functions, and similar clusters were linked.A spectrum of colors was set for each gene cluster but only clusters of interest were annotated in the figure legends of Fig. 2 to 4; Fig. S1 to S8 at https://github.com/lailaziko/Gram-Negative-BGCs.As P. aeruginosa showed multiple NRPS regions in all the samples, Clinker presentation of NRPS was divided on multiple panels (Fig. 4; Fig. S1 to S8 at https://github.com/lailaziko/Gram-Negative-BGCs).Clinker automatically assigns colors for homologous genes as provided by GenBank files output from antiSMASH software.As samples for each species were analyzed independ ently, different colors might be assigned.Therefore, a detailed color legend for each figure was provided.Some samples encoded similar BGCs but not homologous-i.e., not aligning-for Clinker to assign them all the same color.To further check Clinker output, we manually curated the BGC Genbank files generated by antiSMASH to match with Clinker and gray-colored clustered were re-colored if they presented BGCs of interest.
To infer the chemical structure of the selected BGCs and further detect other putative BGCs, PRISM software (42) was utilized with default parameters used for the analysis needed.GECCO was used to (43) analyze the assembled contigs for de novo BGCs and BAGEL was employed (44) for RiPPs and bacteriocin detection.

Data analysis
R was used for the hierarchical classification of the detected BGCs.The normalization was done by dividing each number by the genome size and multiplied it by 10 6 .The heatmap3 package was used for the generation of the heatmaps (45).

NRPS BGCs in P. aeruginosa draft genomes
Three NRPS BGCs are well-characterized in P. aeruginosa species standard strain PA01, and they encode for the formation of pyoverdine (the siderophore key player associated with pathogenesis), pyochelin (a siderophore produced by P. aeruginosa), L-2-Amino-4methoxy-trans-3-butenoic acid (AMB) (involved in quorum sensing), pyoluteorin (an antibacterial compound), in addition to three uncharacterized NRPS BGCs (46).Recently, a mimic of an NRPS pathway to produce mimics of the antibiotic brevicidine was changed to be a ribosomal pathway instead, and this highlights the importance of studying NRPSs and their products as antibacterial agents (47).Further analysis of the NRPS clusters in the strains included in this study is required, and the comparison with the well-characterized Pseudomonas NRPS BGCs, as well as unraveling their function, whether in their contribution to the strain's virulence or to their possible application as antibacterial agents.The results from GECCO were aligning with the antiSMASH results for the P. aeruginosa genomes, as indeed the largest detected class was NRP and it was higher than those also detected in the genomes pertaining to K. pneumoniae and A. baumannii.This warrants further studying, and interestingly, the unknown clusters were the second largest class and require further studying as to what is being coded for, and whether they are cryptic or active BGCs.In general, for all the included strains, antiSMASH detected more BGCs than GECCO platform.Although the difference was most prominent with P. aeruginosa genomes (590 vs 136 BGCs), and the least with K. pneumoniae genomes (146 vs 100 BGCs), it could possibly be because of the inclusion of more BGC types in those strains included in the database of antiSMASH.
The hits obtained by BAGEL are worth pursuing experimentally.Sactipeptides and bottromycin warrant further experimental studying, as well as the unique clusters that were detected uniquely in isolates pertaining to this species.Sactipeptides were recently found to be produced by Streptomyces thermophilus strain and one-streptosactin-was recently characterized and exhibited antibacterial activity (48).Bottromycin is known for its antibacterial effect (49), however, perhaps finding similarity with it would lead to an SM that is antagonistic and in this context, contributing to the virulence of the strains.Perhaps because P. aeruginosa strains are predicted to encode primarily peptides, they had also the most detected hits in BAGEL, as it is a database specific for detection of bacteriocins and RiPPs (50).PaeM is a bacteriocin produced by P. aeruginosa strains (51), and it is worth investigating the hit in the included strains, and testing them for antibacterial effects.The detected pyocins are particularly interesting hits as they are reported bacteriocin types that assist the formation of biofilms and hence are virulence factors (52).Zoocin is also a bacteriocin (53), and it was detected in the included samples.Putidacin is a bacteriocin that is of the type of lectin-like (54).Colicins are bacteriocins that were studied in Escherichia coli (52), and their detection in this species is also worth pursuing.
The NRPS clusters abundant in P. aeruginosa isolates are depicted in Fig. 4; Fig. S1 to S8 at https://github.com/lailaziko/Gram-Negative-BGCs.They are mainly distributed among nine panels according to the alignment and are important for further compari sons, and it shows how there were different NRPS BGCs and thus were aligned differently.We attempted to group together the P. aeruginosa isolates for similar BGC signature profiles (Fig. 5C), and the closer strains cluster together in the heatmap into five main clusters.Table S3 at https://github.com/lailaziko/Gram-Negative-BGCsincludes all the details about the hits obtained by PRISM, and 178 structures were predicted, some were basic structures while some were more detailed chemical structures.We herein report them as a lead together with their annotation in order to be used later for experimental analysis, they include mainly non-ribosomal peptides and acyl homoserine lactones, among other classes.

RiPP-like BGCs in K. pneumoniae draft genomes
RiPP-like clusters are detected by antiSMASH and comprise BGCs that are not detec ted as RiPPs but are however found regularly with RiPPs, including bacteriocins and other unspecified RiPPs (14).Bacteriocins have antibacterial activities and are peptidic in nature that are ribosomally synthesized (55).Bacteriocins were recently detected in Klebsiella genus and identified as klebicins, which are colicin-like bacteriocins (56).These klebicins were found to be effective against Klebsiella clinical isolates, in support that actually bacteriocins are effective against members of the ESKAPE pathogens (56).GECCO results were not in concordance with the antiSMASH results in this regard, as the majority of the detected BGCs comprised NRPs, followed by unknown clusters; however, this discrepancy could be explained by the naming each platform uses, in addition to the inherent workflow difference, that renders GECCO capable of predicting novel BGCs, rather than aligning with characterized BGCs.Interestingly, K. pneumoniae isolates harbored one putative polyketide and nine NRP_polyketide clusters, which were particularly unique and warrants further experimental validation.Future work on our data would encompass that the bacteriocin sequences are further analyzed and tested as to their potential antimicrobial effect.
The hits retrieved from BAGEL were close to the antiSMASH results, as sactipeptides were detected.Their role in pathogenicity as well as targeting them by genus-specific drugs in the future are worth investigating.Bottromycin hits need also further studying, as well as the uniquely detected hits.ComX4 is a RiPP that is involved in surfactin synthesis by being a quorum-sensing player (57), and should be investigated for its role in the included K. pneumoniae genomes.Sactipeptide, bottromycin, and colicin BGCs were also recently reported in clinical isolates of Klebsiella in Thailand (58).
The RiPP-like BGCs which were most common among the K. pneumoniae genomes were aligned and are depicted in Fig. 3, with the similar BGCs aligned together.Among the detected clusters such as RiPP-like cloacin, which is a bacteriocin and coincides in its class with the BAGEL results (59).RiPP-like TIGR03651 was also detected, which belongs to the circular bacteriocin, circularin A/uberolysin family (60).Biosynthetic aminoglyco side phosphotransferase (APH) genes were found, which were earlier reported in the biosynthesis of thiostreptamide S4 which belongs to the anticancer class of compounds, the thioamitide class (61), and hence warrants further investigation.We attempted to group together the K. pneumoniae isolates for similar BGC signature profiles (Fig. 2B), and the closer strains cluster together in the heatmap into four main clusters.Table S2 at https://github.com/lailaziko/Gram-Negative-BGCsincludes all the details about the hits obtained by PRISM, and 51 structures were predicted, some were basic structures, and some were more detailed chemical structures.We herein report them as a lead together with their annotation in order to be used later for experimental analysis, they include mainly non-ribosomal peptides, polyketides, NRPS-independent siderophore synthases, and polyketide-non-ribosomal peptides, among other classes.

Siderophore and aryl polyene BGCs in A. baumannii draft genomes
Siderophore BGCs were recently detected in Nocardia species and possibly contributing to their pathogenicity (62).Recently it was found that A. baumannii utilize several siderophores mainly to bind iron and hence are harboring multiple siderophore BGCs, and one particular siderophore is pertaining to its virulence, namely acinetobactin (63).There are up to 10 siderophore BGCs within A. baumannii genomes (63).The included strains harbored siderophore BGCs in common and their alignment with known A. baumannii BGCs remains to be investigated, as well as their roles in the strain virulence.Aryl polyene BGCs were detected within the genome of the virulent Acinetobacter strain global clone 2 (GC2) (64).Aryl polyene BGCs code for the production of 4-hydroxyben zoyl polyene compounds, and they have a hypothesized function to escape the host immune system (64).The aryl polyene BGCs detected in this study require further analysis as to their similarity to characterized aryl polyene BGCs of other strains and their role to be investigated.The GECCO results were not in concordance with the antiSMASH detected BGCs, as the most BGCs were detected as NRPs, which could possibly be because of the different classes detected by each tool, and that GECCO types are more limited than antiSMASH BGC types that could be detected.It is noteworthy that 29 unknown BGCs were detected as well as four polyketides, which points toward their study and might explain those detected by antiSMASH as well as additional ones.
It is noteworthy that the lasso peptides are prominent in this genus, as well as the unique clusters detected.The functions of the detected clusters need to be deciphered, as to their role in virulence and could be possible targets for drugs towards these specific strains.Lasso peptides were earlier produced by other Acinetobacter strains, such as A. gyllenbergii, that produce acinetodin (65).Sactipeptides and colicins are also bacteriocins as mentioned earlier, and their role in pathogenicity as well as possible antibacterial effects need to be further investigated.
The siderophores were most commonly detected in A. baumannii genomes and their alignment is depicted in Fig. 2. lucA//lucC gene family hits were detected in the siderophore BGCs as it is an iron uptake chelate domain involved in siderophores biosynthesis.lucA and lucC are ligases and NRPS-independent siderophore synthetases that were previously studied in hypervirulent K. pneumoniae (66).We attempted to group together the A. baumannii genomes for similar BGC signature profiles (Fig. 5A), and the closer strains cluster together in the heatmap into two main clusters.Table S3 at https://github.com/lailaziko/Gram-Negative-BGCsincludes all the details about the hits obtained by PRISM, and 45 structures were predicted, some were basic structures, and some were more detailed chemical structures.We herein report them as a lead together with their annotation to be used later for experimental analysis, they include mainly non-ribosomal peptides, NRPS-independent siderophore synthases, acyl homoserine lactone-non-ribosomal peptide, and acyl homoserine lactones, among other classes.
In conclusion, our study investigates the BGCs present in three members of the ESKAPE pathogen panel that are relevant in hospitals and has highlighted the most common BGC type in each of the A. baumannii, K. pneumoniae, and P. aeruginosa strains.We predict those BGCs perhaps play a specific role in the virulence of each strain, which warrants further experimental validation.Several isolates of the same species were analyzed to study the similarities and differences between their encoded BGCs.The BGCs pertaining to the same species indeed show differences as visualized in Fig. 2 to 7; Fig. S1 to S8 at https://github.com/lailaziko/Gram-Negative-BGCs,with the common and different BGCs indicated.There is a common signature BGC profile for each species; however, there were inter-species differences with regard to their BGCs.

FIG 2
FIG 2 Siderophore BGCs in Acinetobacter baumannii.The colors in the legend indicate the biosynthetic gene clusters of interest.Other colors are either of unknown function or not the main aim of the current study.The letter "S" stands for the sample, followed by the sample number, then the contig number, then the base pair position of the BGC on this contig.

FIG 3
FIG 3 RiPP-like BGCs in Klebsiella pneumoniae.Colors that are not present in the legend represent gene clusters of functions other than BGCs.

FIG 4
FIG 4 NRPS BGC subgroup 1 in Pseudomonas aeruginosa.Other NRPS subgroups in Pseudomonas aeruginosa isolates are grouped according to alignment and similarity and provided in the figures at https://github.com/lailaziko/Gram-Negative-BGCs.

FIG 5
FIG 5 Hierarchical clustering of the bacteria based on their BGC profiles: (A) Acinetobacter baumannii, (B) Klebsiella pneumoniae, and (C) Pseudomonas aeruginosa.

FIG 6
FIG6 GECCO results for all isolates of each species.

FIG 7
FIG7 BAGEL results for all isolates of each species.