Model-driven characterization of functional diversity of Pseudomonas aeruginosa clinical isolates with broadly representative phenotypes

Abstract Pseudomonas aeruginosa is a leading cause of infections in immunocompromised individuals and in healthcare settings. This study aims to understand the relationships between phenotypic diversity and the functional metabolic landscape of P. aeruginosa clinical isolates. To better understand the metabolic repertoire of P. aeruginosa in infection, we deeply profiled a representative set from a library of 971 clinical P. aeruginosa isolates with corresponding patient metadata and bacterial phenotypes. The genotypic clustering based on whole-genome sequencing of the isolates, multilocus sequence types, and the phenotypic clustering generated from a multi-parametric analysis were compared to each other to assess the genotype–phenotype correlation. Genome-scale metabolic network reconstructions were developed for each isolate through amendments to an existing PA14 network reconstruction. These network reconstructions show diverse metabolic functionalities and enhance the collective P. aeruginosa pangenome metabolic repertoire. Characterizing this rich set of clinical P. aeruginosa isolates allows for a deeper understanding of the genotypic and metabolic diversity of the pathogen in a clinical setting and lays a foundation for further investigation of the metabolic landscape of this pathogen and host-associated metabolic differences during infection.


InTRoDuCTIon
Pseudomonas aeruginosa is a major contributor to a broad range of nosocomial infections and is commonly associated with many clinical cases including cystic fibrosis, pneumonia, urinary tract infections, sepsis and skin infections [1][2][3][4][5][6][7].The ability of P. aeruginosa to survive and adapt to diverse and challenging habitats and growth conditions makes this opportunistic pathogen successful in colonizing and infecting many physiological niches within the human host [8].Clinically isolated P. aeruginosa strains are often multiclonal and demonstrate diverse metabolic and phenotypic traits.Many recent studies investigated strain-specific and condition-specific gene essentiality [9], antibiotic susceptibility [10,11], horizontal gene transfer, virulence and emergence of antimicrobial resistance [12][13][14][15][16], and phylogenetic [17] and phenotypic diversity [18] of clinically isolated strains.Additional work explored the functional repertoire of the core and accessory genomes in P. aeruginosa [9,13,[19][20][21].These studies broadened our understanding of the genomic landscape across the P. aeruginosa pangenome and antimicrobial resistance in strains from a wide range of infection types and environments.
However, most of the abovementioned studies only focused on either the genetic diversity among clinical isolate populations or selected isolates from a single infection site or environment, and sometimes only interrogated specific clonal types of interest.Studies that included a large population of isolates often selected a smaller subset based on SNP diversity within the isolate sequences.Therefore, comprehensive investigations of the diverse phenotypes and functional metabolic repertoire of clinically isolated strains of P. aeruginosa from multiple body sites and associated with multiple patient comorbidities are still rare.Moreover, genetic variability, mutations and recombination events play an important role in the phenotypic diversification and metabolic heterogeneity of P. aeruginosa [22,23], which is also evident in the phenotypes observed in our previous study with these isolates [10].These limitations have made it difficult to gain a deeper understanding of P. aeruginosa metabolism and infections that can span many isolation sites in the human body, multiple patient comorbidities, and other host factors, which can affect the observed microbial phenotypes.
We hypothesize that the metabolic differences across P. aeruginosa isolates are dependent on a complex combination of host influences and pathogen-specific factors, which can be delineated using a combination of 'omic' analyses and the experimental characterization of unique metabolic traits in different clinical isolates.The advancements in efficient, high-throughput genome sequencing have opened the possibility to better understand the diverse metabolism of P. aeruginosa and the relationships between multi-omics data.To this end, genome-scale metabolic modelling has been used in previous studies to explore the functional diversity among strains of a species or within a pangenome [24,25].
In this work, we performed experimental and computational analyses including sequencing and analysing genomes of a set of clinical isolates representative of the diversity observed in a clinical setting.We explored the genetic, sequence type and phenotypic diversity, and their correlation within the representative set of isolates.We performed functional annotation of these genomes to enable the reconstruction of genome-scale metabolic networks, and we performed flux sampling analyses with these network models to delineate differences and similarities of the P. aeruginosa clinical isolates.We evaluated metabolism shared across isolates, as well as metabolism unique to each.By providing an integrated view of the correlation between clinical metadata information (patient demographics, isolation source and isolate morphology), and high-throughput multi-omics data, we can better characterize the functional diversity of clinical isolates, and hopefully better understand how to treat infections caused by this versatile human pathogen.

Isolate collection
As described previously, a total of 971 clinical isolates of P. aeruginosa were collected from the University of Virginia (UVA) Health System Clinical Microbiology Laboratory between February 2019 and February 2020 (IRB-HSR numbers 21191 and 21949) [10].For each isolate, associated patient metadata (age, sex, isolation sites and whether the patient had cystic fibrosis or diabetes mellitus), bacterial morphological phenotypes and antimicrobial-susceptibility profiles were recorded [10].Isolate sources that could not be classified as from the lung/trachea, urine/catheter, ENT (Ear-Nose-Throat)/sinus, skin/wound or blood were grouped as 'other.' Table S4 (available with the online version of this article) lists all the isolates used in this study and their patient metadata and phenotypes.

Stratified random sampling
The entire population of 971 isolates was divided into 10 homogeneous strata or subgroups according to the patient metadata (e.g.age, sex, isolation sites, presence of cystic fibrosis or diabetes mellitus, season) and isolate phenotypic properties (e.g.mucoidy, colour, haemolytic property and metallic sheen).Stratified random sampling [26,27] from each of the non-overlapping strata was performed.A mixed-integer linear programming (MILP) formulation was used to draw a random sample from the different strata separately.

Maximize
Where y i, p is the binary variable denoting the inclusion or exclusion of the isolate into the sampled set.S is the set of the strata for each phenotypic trait P. K i, p is the population proportion of the stratum i in the phenotype p, n is the total number of sample units available for allocation, N i is the number of sample units to allocate to stratum i, and C is a dummy constant.This framework allows one to obtain an effect size from each stratum/phenotype separately and independently of other phenotypic properties.It also ensures that isolates are selected from every stratum instead of leaving out the minority sets, which can happen in random sampling.

Bacterial culture and genome sequencing
Selected P. aeruginosa isolates were transferred from frozen stocks into LB broth with shaking (37 °C, 150 r.p.m., 16-24 h).Cultures were plated on blood agar and cetrimide agar for phenotype verification.Cells were pelleted from cultures and DNA isolated using the DNeasy UltraClean microbial kit (Qiagen) according to the manufacturer's instructions.Purified DNA was quantified using the broad range dsDNA kit (DeNovix), libraries generated and samples sequenced (2×151 bp; 400 Mbp per sample) on the Illumina NextSeq 2000 platform (SeqCenter).Sequences are available at http://www.ncbi.nlm.nih.gov/bioproject/937715.

Sequence analyses
FastQC v0.11.9 ( bioinformatics.babraham.ac.uk/ projects/ fastqc/) was used to examine the quality of both the forward and reverse reads.Trim Galore v0.6.5 ( bioinformatics.babraham.ac.uk/ projects/ trim_ galore/) was used for automatic quality and adapter trimming of the sequences.Across all the paired isolate sequences, the median per base sequence quality score was above 32.For assembling the paired-end reads to contigs, Velvet [28] was used with a max k-mer length of 101 bp.The resulting contigs were ordered against the reference sequence of P. aeruginosa PA14 genome (NCBI ID: NC_008463.1).Pairwise genome comparison was performed using ACT [29] after converting the multi-FASTA files generated by Mauve [30] to single-FASTA format files.blast (version 2.10.0) was used to create comparison files between the reference PA14 genome and the isolate genome sequences.blast Ring Image Generator (brig v0.95) [31] was used to generate the ring diagram showing the isolate whole-genome sequences in comparison with each other as well as the reference PA14 genome.

Clustering and correlation
Phylogenetic clustering was performed with mega11 [32,33] on a local workstation.The whole-genome nucleotide sequence was translated to protein sequences before a blastp search (blast version 2.10.0) was performed across the isolates.The individual isolate protein sequence '.fasta' files were inputted to the mega alignment explorer and the muscle [34] algorithm was used to align the sequences.Multiple Alignment Gap Opening penalty was set to 3 and the Multiple Alignment Gap Extension penalty was set to 1.8, according to recommended values [35].UPGMA maximum parsimony was used to evaluate the phylogenetic tree and an unrooted tree was generated [36].For clustering of non-numeric categorical data like the phenotype and patient metadata corresponding to each of the isolates, a metric called 'category utility' (CU) was used.CU is a measure of 'category goodness' that attempts to maximize both the probability that two objects in the same category have similar attribute values, and the probability that objects from different categories have different attribute values [37].The CU value of a given clustering of a data set is a numeric value that reflects how well the clustering is performed.Larger values of CU indicate better clustering.
Since data clustering is an NP-hard problem, there is no way to find an optimal clustering without examining every possible clustering.One way to increase the likelihood of an optimal clustering is to cluster the data set multiple times with different initial cluster assignments.Then, the algorithm iteratively tries to find the best clustering of the data based on the initial cluster assignments.In this study, 125 different initial cluster assignments were attempted and the clustering with the highest CU value of 0.3999 was chosen as the optimal phenotypic clustering of the isolates.
Multilocus sequence typing (MLST) was evaluated for the isolates on the PubMLST database (https://pubmlst.org;accessed August 9 2023) with P. aeruginosa as the genome assembly for the species [38,39].A total of 19 of the 25 isolates showed a 100 % identity match of the sequences for all the seven housekeeping genes (acsA, aroE, guaA, mutL, nuoD, ppsA and trpE) on the PubMLST database.For the six other isolates, the nearest match of sequence type was chosen (see Table S1 for detailed results).
Based on the MLST results, a minimum spanning tree (MST) was created with PHYLOViZ (online version as of August 9 2023) [40].Hierarchical clustering and entanglement scores were enumerated using the dendextend package [41] version 1.17.1 in R version 4.1.2.
Non-metric multidimensional scaling (NMDS) [42] was used to reduce the dimensionality of the flux sampling results from each of the isolates.A Bray-Curtis distance metric [43] was used in the evaluation of the NMDS.A maximum iteration limit of 500 was set.The optimal stress value of the resulting NMDS was 0.1480.

Functional annotation
The '.fasta' files containing the post-processed genome sequences were annotated based on the KEGG biochemical database (as of December 2021) [36,44].First, each of the sequence files was processed with Prodigal v2.6.3 [45] to translate to protein sequences and the coordinates for the coding sequences.A blastp search (blast version 2.10.0)across the entire KEGG prokaryotic database was performed for each of the protein-encoding sequences using diamond v2.0.14 [46].A per cent identity >98 %, E value <1e−05 and a bit score >50 were used as thresholds for selecting best sequence similarity matches.
Protein IDs from all the KEGG matches were then translated to KEGG orthologue (KO) numbers after removing duplicates.KEGG reaction IDs were obtained by parsing the KOs to reaction mapping.

Genome-scale metabolic network reconstructions and analyses
A genome-scale metabolic network reconstruction was generated from the annotated genome sequence for each of the 25 isolates.The recently published genome-scale network reconstruction of P. aeruginosa PA14 (iPau21) by Payne et al. [47] was used as the foundation on which the isolate models were reconstructed.Since the base model contained reaction and metabolite IDs in the ModelSEED [48] namespace (as of March 2021), the annotated KEGG reactions and associated metabolite IDs were translated to ModelSEED IDs.The reactions that were annotated in each of the isolate genomes but were not present in the iPau21 model were added to the corresponding reconstruction.The models were checked for mass and charge imbalances.For generating flux samples, each isolate model was sampled with optGpSampler [49] algorithm in CobraPy v0.21.0 [50] for 500 times.The Jaccard distance [51] of phenotypic categories of the isolates was calculated in a pairwise fashion.The NMDS distance was calculated as the distance between the median NMDS coordinates of isolate pairs.Spearman's correlation [52] was used to calculate a P value for the relationship between NMDS distance between isolate pairs and their phenotypic distance (Jaccard index).

Genome sequencing of the representative isolate group
Between February 2019 and February 2020, a total of 971 clinical P. aeruginosa isolates from 590 patients of the UVA Health System were collected as reported previously [10].For each of the 971 clinical P. aeruginosa isolates, patient demographic profiles (age, sex), comorbidities (cystic fibrosis or diabetes) and isolate phenotypic traits (mucoid phenotype, metallic sheen, pigment production and haemolytic activity) were collected and tabulated.To understand their genotypic, phenotypic and metabolic variance and to assess the shared and unique traits that they can manifest, we extensively studied a sample population of 25 of these P. aeruginosa isolates.We performed stratified random sampling (see Methods for additional details) to identify a sub-population of 25 isolates that represents the phenotypic diversity of the entire population of 971 isolates.Fig. 1 shows the schematic of the stratified random sampling procedure (Fig. 1a), a visual representation of their sequence similarity analyses (Fig. 1b), and the phenotypic properties and associated patient metadata of the 25 selected isolates (Fig. 1c).
Isolates were grouped into five clusters based on their phenotypic similarities using the measure of CU (see Fig. 1c).The CU hypothesis proposed by Corter and Gluck [37] states that the categories that become preferred in a population are those that best describe the diversity of the population (see Methods for additional details).We used CU to maximize both the probability that isolates in the same cluster have phenotypic attributes and patient metadata in common, and the probability that isolates from different clusters have different phenotypic attributes and patient metadata.This is a measure of the probability that all the isolates in each cluster will assume the same categorical value for each of the 10 different phenotypic categories in this study.The maximum calculated CU value achieved for this clustering (in Fig. 1b and c) was 0.3999 out of 125 different random combinations of clusters.With this clustering, then, there is approximately a 40 % probability that all the isolates grouped together have the exact same values in every phenotypic property, and also that isolates in different groups do not have the same values in every phenotypic property.
Fig. 1(a) illustrates the stratified random sampling procedure for selecting the subset of clinical isolates representative of the phenotypic diversity of the entire population of 971 isolates.First, the isolate collection is separated into different strata based on each of the 10 phenotypic properties (season, patient's age, patient's sex, isolation site, presence of cystic fibrosis and diabetes, mucoidy, haemolytic capability, metallic sheen, and pigmentation).In each stratum, the number of isolates maintains the same proportion as the original large population.Then, isolates are chosen at random from each stratum and a sample set of 25 isolates is chosen.
To get the complete genomic and functional profile of the 25 selected isolates, we sequenced their genomes.The comparison of the whole-genome sequences of the 25 isolates was performed using mega11 software [32] (see Methods for additional details).We aligned the genome sequences to the reference P. aeruginosa PA14 genome (NCBI ID: NC_008463.1)and the alignment is displayed in a ring diagram (Fig. 1).The purple ring in the middle of the ring diagram is the PA14 reference genome.The rings are ordered according to their phenotypic clustering shown in Fig. 1(c).The gaps in the rings correspond to the gaps in the sequence alignments.

Correlation between whole-genome sequences, sequence types and the phenotypic properties of the isolates
Since the isolates demonstrate some degree of genetic variability, their functional uniqueness might play an important role in the phenotypic diversification and metabolic heterogeneity.The wide range of phenotypic diversity observed in our previous study with these isolates [10] led us to investigate further the correlation between the genome sequence of the isolates and their phenotypes.In addition to the diversity based on whole-genome sequences, MLST [38] was used to evaluate the allelic variation of seven housekeeping genes in the isolate genome sequences, generating the sequence types to characterize isolates.The sequence types of the isolates are listed in Tables S1-S4.Since we had all the isolates sequenced, we also performed a clustering of 25 clinical P. aeruginosa isolates based on core-genome MLST (cgMLST) profiles, which is shown in Fig. S1.
Fig. 2(a) shows the hierarchical clustering of the P. aeruginosa clinical isolates based on their MLST types and similarity scores on the seven housekeeping genes (allelic profiles).The colour scale shows that the Euclidean distances between most of the isolates are low, except isolate UVA88299 setting itself apart in the distance matrix, indicating that its allelic profile is the most distinct from the other isolates.The two other most distinct isolates are UVA51075 and UVA61605, which are also very distant from each other.
To investigate whether the allelic profiles of the isolates are correlated to the genomic differences based on whole-genome sequences or their phenotypic clustering, we compared the hierarchical clustering of the isolates based on the MLST (allelic profile) to the phenotypic clustering (Fig. 2b) and whole-genome-sequence-based clustering (Fig. 2c).In addition, we compared the genome-sequence-based clustering to the phenotypic clustering to understand their correlation (Fig. 2d).These comparisons were performed by superimposing the two respective hierarchical clustering trees and enumerating the entanglement scores between them.A lower (closer to zero) entanglement score means that the trees are very similar to each other and the correlation between the two respective hierarchical clustering is good.
It is noticeable from the superimposed trees and the entanglement scores that the phenotypic, genome-sequence-based and MLST-based clustering do not correlate well with each other.The observed poor correlation may indicate significant differences in functional annotation differences, regulatory mechanisms, post-translational modifications and other factors that define the functional behaviour of an organism in a given environment.We also note that the entanglement scores are highly similar in each of the comparisons, which suggests that neither the genome sequence nor the MLST type is a strong predictor of the observed phenotypes.To further characterize the diversity among the clinical P. aeruginosa isolates, we performed functional annotation of the isolates' genome sequences.

Functional annotation of the isolates and P. aeruginosa pangenome analysis
Each of the isolate genome sequences was annotated for metabolic functions based on the KEGG biochemical database [36,44] (see Methods for details).For each isolate, a mean of 50 000 blast hits were found in organisms across 1462 genera, 506 families and 61 phyla from the KEGG database.Interestingly, most of the metabolic functions in the P. aeruginosa isolate genomes are shared with >1000 organisms in the KEGG database, while only a few metabolic functions in the isolates are shared with <10 organisms.Fig. 3(a) shows a distribution of KOs annotated in the isolate genomes and the number of non-P.aeruginosa species the specific KO is shared with.Out of the approximately 2000 KOs in the P. aeruginosa isolates that were found to be shared with species in the KEGG database, only 27 were shared solely with other Pseudomonas species.These KOs mainly belong to the biosynthesis of macrolides, phosphonate and phosphinate metabolism, and biosynthesis of polyketide sugar units.As indicated with the axis on the right of Fig. 3(a), all KOs in the profiled P. aeruginosa isolates are found in other organisms.
We calculated the fraction of KOs associated with a KEGG pathway that is present in the P. aeruginosa isolates as well as the number of species in a given phylum that contains orthologues to a given KO present in the isolates (Fig. 3b).We calculated the fraction of KOs in a given pathway that is shared between the P. aeruginosa isolates and other organisms as a function of the number of organisms with shared functions.The number and diversity of the shared pathways increase with increasing number of organisms, but they uniformly belong to all the pathways (see Fig. 3c).When we binned the KOs shared with increasing numbers of species (10-100, 100-1000 and more than 1000), we observed that the diversity of the non-pseudomonads to which the isolates share their metabolic functions increased, but distribution across different phylogenetic lineages also becomes more uniform (see Fig. 3d).This analysis helps to identify metabolic pathways that are unique to the P. aeruginosa isolates we evaluated, as well as which metabolic pathways are shared across multiple phyla.
For example, some of the major metabolic pathways that the clinical isolates share with less than 100 other species are novobiocin biosynthesis and phenazine biosynthesis.Novobiocin is a dibasic acid, which, like other aminocoumarin antibiotics, inhibits bacterial DNA synthesis by targeting the bacteria DNA gyrase and the related enzyme DNA topoisomerase IV.Although this antibiotic was effectively used to treat infections by Gram-positive bacteria like many staphylococci [53,54], novobiocin has limited activity against Gram-negative organisms (like P. aeruginosa).This difference is due to the presence of the lipopolysaccharide-containing outer membrane in Gram-negative species that acts as an impermeable barrier.P. aeruginosa has previously been observed to be resistant to novobiocin, mainly attributed to its MexAB-OprM multidrug efflux system that acts on this antibiotic [55].Other studies showed that novobiocin is responsible for binding to and inactivating the nalD gene that can repress the efflux pump and, therefore, contributes to intrinsic multidrug resistance in P. aeruginosa [56].
P. aeruginosa uses many secondary metabolites that act as virulence factors and negatively affect prokaryotic competitors and eukaryotic hosts through growth inhibition or cell death [57].Phenazines, such as pyocyanin, are redox-active, coloured heterocyclic compounds and are responsible for the green fluorescence of P. aeruginosa.They play important roles in electron cycling, oxidative stress and iron acquisition [58,59].Phenazine biosynthesis is responsible for promoting antibiotic tolerance and toxin production in P. aeruginosa.It also enhances the fitness of P. aeruginosa in a biofilm environment and controls the production of many virulence factors [60,61].
While the isolates shared most of their metabolic functions with many other non-Pseudomonas organisms, we observed that the annotated isolate genomes introduce several unique metabolic functions (KOs) into the currently annotated P. aeruginosa pangenomic landscape.Fig. 4(a) shows the increasing addition of metabolic functions to the P. aeruginosa pangenome with the increasing number of isolates sampled at random.These data have a large distribution because the number of combinations from which one can choose n number of isolates out of 25 is large, and each isolate contributes varying numbers of unique reactions toward the sample combinations.In total, the 25 clinical isolates introduce 66 metabolic functions, each of which is unique to a single clinical isolate.These KOs belong to aminobenzoate degradation; C5-branched dibasic acid metabolism; caffeine metabolism; chlorocyclohexane and chlorobenzene degradation; fructose and mannose metabolism; glycine, serine and threonine metabolism; inositol phosphate metabolism; lysine biosynthesis; phenylalanine, tyrosine and tryptophan biosynthesis; purine metabolism; pyruvate metabolism; tryptophan metabolism;  and tyrosine metabolism, prominently.These KOs are not currently present in the annotated genome of P. aeruginosa in the KEGG database and were matched through sequence similarity against other organisms, mostly from the phyla Actinobacteria, Cyanobacteria, Euryarchaeota, Firmicutes and Proteobacteria (see Table S2 for a complete list).While most previously characterized Pseudomonas strains do have some genes associated with many of these biosynthetic pathways, the newly identified genes in the clinical isolates perform additional or complimentary functions in these pathways.Since P. aeruginosa has extensive accessory genomic content within its pangenome, the pangenome size will likely keep increasing with each additional genome sequenced and annotated over time.This analysis gives additional insight into the possibilities of finding novel metabolic functionalities in new strains isolated from previously unexplored sources.
Fig. 4(b) shows the distribution of the core, accessory and unique metabolic functions across the 25 clinical P. aeruginosa isolates.The majority (77.4 %) of the annotated metabolic functions in the isolate genomes are shared between all of the isolates (further shown in Fig. 4c), while 18.3 % of them are shared by different sub-populations of the isolate collection (further shown in Fig. 4e).There are 13 metabolic pathways for which more than 20 % of the reactions in that pathway are shared across all 25 isolates (Fig. 4c).These include a total of 114 reactions in xylene degradation, fatty acid elongation, and galactose metabolism and tryptophan metabolism, among others.A list of the unique and shared reactions and the pathways these reactions belong to are presented in Table S3.
The unique and shared metabolism between clinical P. aeruginosa isolates has significant importance in guiding any future therapeutic development.While new drug design or repositioning strategies should be efficient enough to target unique metabolic traits of the clinically relevant isolates, their mechanism of action should also target a diverse set of strains to be effective as a treatment.The presence of unique functional traits in the metabolism of individual isolates creates a challenge for any therapeutic development effort.Therefore, it is important to characterize the metabolic functions unique to each of the isolates.Fig. 4(d) shows the pathways in which unique reactions appear in the isolate genome annotations.Some of the isolates do not have any unique reactions, but eight of the isolates contain unique reactions in several pathways that are not shared by any other isolate.Interesting observations include the biosynthesis of type-II polyketide backbone in UVA32920, and caffeine metabolism in UVA37832, among others.Bacterial aromatic polyketides, with their diverse structure, are involved in diverse biological activities, including producing antimicrobial components and deterrent molecules to outcompete other organisms, host immunosuppression, and virulence [62][63][64][65][66]. Thiamine metabolism has gained a lot of attention as a potential therapeutic target because of its effect on the sensitivity of P. aeruginosa to many antibacterial agents.l-Glutamine:2-deoxy-scyllo-inosose aminotransferase and l-glutamine:3-amino-2,3-dideoxy-scyllo-inosose aminotransferase enzymes, unique to the isolate UVA22511, are among the first few steps of the linear pathway in neomycin, kanamycin and gentamicin biosynthesis.While many of the other isolates, as well as reference strain PA14, have other genes in the neomycin, kanamycin and gentamicin biosynthesis pathway, these two genes are uniquely present in isolate UVA22511, which completes the conversion from d-glucose to 2-deoxystreptamine, an important part of antibiotic biosynthesis.Caffeine is known to inhibit the capability of P. aeruginosa to synthesize virulence factors and form biofilm by affecting the swarming motility and quorum sensing [67][68][69].Degradation of caffeine is known to be present in several Pseudomonas species [67,70], but the completeness of the caffeine metabolism pathway in only one of the isolates (UVA37832) potentially indicates its robustness against caffeine's inhibitory action fitness and virulence in a clinical setting.Many of the unique pathways in other isolates are involved in extracellular signalling, regulation, as well as quorum sensing.The extensive accessory genomic content between the P. aeruginosa clinical isolates and their diversity may be indicative of evolutionary adaption driven by horizontal gene transfer.The investigation of the trade-offs between the different physiological processes associated with horizontal gene transfer and the cost of the mobile genetic elements is certainly a continuously evolving research area itself.

Genome-scale metabolic network reconstructions of the isolates
To quantify the functional impact of these differences in metabolic gene content, we generated genome-scale metabolic network reconstructions for each of the clinical isolates.Metabolic network reconstructions combined with constraint-based analyses allow for a quantitative exploration of the functional repertoire and diversity of biological systems [71][72][73].We used the previously published genome-scale metabolic network reconstruction of P. aeruginosa PA14, iPau21 [47], as the backbone on which the additionally annotated metabolic reactions were added to generate the draft reconstructions of the isolates.The P. aeruginosa PA14 metabolic model was also amended with additional annotated reactions that were absent in iPau21.Each of the models was checked for reaction mass balance.The total number of reactions, unique reactions, genes and metabolites are shown in Table 1.The Systems Biology Markup Language (sbml) versions of all the reconstructions are available at https://anonymous.4open.science/r/PA_clinical_isolate_reconstructions/.

Diversity of metabolic flux states of the isolate metabolic reconstructions
The isolate metabolic network reconstructions were simulated in in silico synthetic cystic fibrosis media (SCFM) as described previously [47] to generate 500 flux sample predictions for each model.SCFM is a physiologically relevant medium that elicits more realistic metabolic behaviour of the isolates in an infection setting.To compare the flux sampling results across isolates, a NMDS method was used.Fig. S2 shows the flux sampling results in a 2D plane.Here, each data point represents a functional metabolic snapshot of the flux sampling data.The data points are coloured for each of the isolates, and we observe that while most of the isolates as well as the reference PA14 model overlap in certain functional metabolic states, each of them has a distinct set of flux samples, as shown by the outward rays from the centre.To ensure convergence of the flux sampling algorithm, we also enumerated 10 000 flux samples for each of the isolates and performed NMDS (shown in Fig. S3).The NMDS distances and the correlation with phenotypic data do not change significantly when a higher number of flux samples are performed.
To evaluate whether the flux sampling results correlate with the isolate phenotypes and patient metadata, the NMDS plot was colour-coded with each of the phenotypic categories in Fig. 5.Some phenotypic traits correlate with the flux samples better than others.For example, the correlation values of the isolate collection site (0.46) and patients' sex (0.44) are noticeably higher than the correlation to other traits like comorbidities and isolate morphology.While these calculated correlation values are possibly confounded by many unknown factors, the higher correlation values between isolates' functional landscape and patient demography and isolation site indicate that specific host environment may be a critical factor in shaping P. aeruginosa metabolism.Different environmental niches within the host have been observed to rewire the metabolism and pathogenicity of Pseudomonas strains in recent studies [74][75][76][77].The kind of modulation of metabolic functions in different host sites may have significant implications in the evolution of the pathogen within an extended timeframe of infection, which has also been observed in laboratory studies [78].

Conclusions
We performed a multi-faceted analysis of a representative group of P. aeruginosa clinical isolates by employing whole-genome sequencing, phenotypic and genotypic clustering, functional annotation and analyses on core, accessory and unique traits in the P. aeruginosa pangenome, and genome-scale metabolic network modelling.This study demonstrates the importance of an in-depth study of isolate sources, patient metadata and morphological phenotypes and their connections to the diverse metabolic landscape of clinical P. aeruginosa isolates.Through our multi-dimensional approach, we have characterized the diverse genotypes and metabolic functions across the clinical isolates with a wide range of phenotypes.We utilized a publicly available high-quality clinical data set and developed several analysis pipelines that can be employed for other human pathogens.
The extensive work by Dunphy et al. [10] to assemble the collection of 971 clinical isolates from the UVA Health System laid the groundwork for the current study.To select a representative group of isolates from the huge collection, the use of stratified random sampling allowed us to obtain a sample population of isolates that best represents the phenotypic diversity of the entire isolate collection.Whole-genome sequencing allowed us to compare the genomic content and, therefore, differentiate the selected isolates based on sequence similarity (Fig. 1).We found that the genomic differences among the set of 25 isolates were lower than those observed in other P. aeruginosa pangenomic studies [13,19,20,79].The allelic profiles of the sequenced isolates were not too diverse from each other, as observed in Fig. 2.This result is not unexpected considering this study involved isolates from one single geographical location.Also, a strong correlation between the genomic content, multilocus sequence type and phenotypic characteristics was not observed in our study.For a more complete picture of their functional diversity and uniqueness, annotation of the genomic sequences was important.It revealed not only how the isolates are similar or unique compared to each other as well as the reference P. aeruginosa PA14 strain, but also how the clinical isolates expand the entire metabolic landscape of the P. aeruginosa pangenome, sharing metabolic functions from a diverse range of species (Figs 3a, b, c, d and 4).With future sequencing and further annotation of the many Pseudomonas strains, these metabolic functions will certainly be identified in other isolates.
Genome-scale metabolic models are powerful platforms for analysing the active metabolism of a species by using computational tools based on constraint-based modelling methods [71][72][73][80][81][82].In this work, we reconstructed the metabolic networks of each of the isolates from the genome annotation and built on top of a highly curated, genome-scale metabolic model of the PA14 strain, iPau21 [47].While we recognize that there are many non-metabolic functional differences that might be present among the different clinical isolates, we primarily focus on metabolic reactions that distinguish their behaviour.We enumerated 500 flux samples for each of the isolate models to evaluate their relative distance based on a NMDS method.We observed that while most of the strains share much of their metabolic profiles, all of them show some unique metabolic functions (Fig. 5).Correlation of the flux sampling results with phenotypic traits identifies patient sex and strain isolation site as the most important factors in shaping the pathogen metabolism.We also explored the distinguishing feature of the annotated KEGG pathways and performed the same non-metric multidimensional analysis based on the annotated functional content in each of the isolates (shown in Fig. S4).We observed that none of the isolate phenotypes or patient metadata correlate with the functional content-based dimensionalization significantly better than flux sampling data.
There are several areas where this study can be complemented by other in vitro analyses and high-throughput analyses.For example, while the whole-genome sequencing and annotation provide us with a more complete picture of the metabolic capabilities of the clinical isolates, a high-throughput transcriptomic study could reveal their metabolic adaptations in an infection setting when they are subject to a host immune system among other factors.Functional transcriptomics of the core and unique metabolism of the different isolates can also enable the evaluation of the strain-specific variations in virulence mechanism and adaptability.In addition, the semi-automatic curation of the isolate metabolic models can be further refined with high-quality gene essentiality data, as well as substrate consumption and fitness profiles.Overall, this work paves the foundation for further integrative studies to understand the mechanisms and diversity of metabolic modulations in P. aeruginosa associated with the host environment.

Fig. 1 .
Fig. 1.Process of selecting a representative set of 25 clinical isolates from the large population, their genome similarities and phenotypic clustering.(a) Schematic of the stratified random sampling procedure to select a representative set of 25 isolates from the large isolate collection of Dunphy et al. [10, 32, 35].(b) Circular comparison of the isolate genomic sequences with PA14 genome sequence as reference (innermost circle) generated by blast Ring Image Generator [31].blast rings are arranged in five groups of alternating light and dark grey based on the phenotypic categories from (c).(c) Morphology and patient metadata associated with the selected 25 isolates.Isolates are grouped into five clusters based on their phenotypic categories (CU value=0.3999).Phenotypes are colour-coded according to the category values.nocf, patient does not have cystic fibrosis; cf, patient has cystic fibrosis; nodb, patient is non-diabetic; db, patient is diabetic.

Fig. 2 .
Fig. 2. Clustering of the P. aeruginosa clinical isolates based on multilocus sequence types, genome sequence and phenotypes, and their correlation with each other.(a) Hierarchical clustering of the isolates, along with four commonly studied reference P. aeruginosa isolates, based on their MLST sequence type based on seven housekeeping genes.Comparison of clustering between (b) MLST type and phenotype, (c) MLST type and whole-genome sequence (WGS), and (d) phenotype and whole-genome sequence.The scale bas show the branch distances.

Fig. 3 .
Fig. 3. Functional annotations shared between the clinical P. aeruginosa isolates and other organisms.(a) Distribution of shared KOs based on the number of non-Pseudomonas organisms that contain the same KO.The area plot shows the cumulative number of functions that increase with increasing number of organisms.(b) A schematic of the search process for shared KOs with other organisms in the KEGG database.(c) Distribution of shared metabolic functions across different pathways.Only the pathways that shared >50 % of the total KOs in that pathway with other organisms are displayed.With the increasing generality (no. of organisms), increasing numbers of pathways are shared between the isolates and other organisms.(d) Distribution of shared metabolic functions across different phyla.With the increasing generality (no. of organisms shared), the distribution of shared KOs tends to converge to a more uniform distribution across different phyla.

Fig. 4 .
Fig. 4. Analysis of the functional content of the P. aeruginosa clinical isolates.(a) Box-plot illustrating the distribution of unique metabolic functions (KOs) contributed by a specific number of isolates that are selected at random.(b−e) Comparison of consensus and unique metabolic pathways across P. aeruginosa isolate models.The numbers along horizontal axes correspond to the number of reactions in the pathway.(b) Distribution of the core, accessory and unique metabolic functions across the 25 clinical P. aeruginosa isolates.(c) Major pathways that are completely shared by all the 25 clinical P. aeruginosa isolates.(d) Unique metabolic functions in each of the clinical P. aeruginosa isolates.Pathways coloured in grey are present in only one isolate.Pathways shared with at least one other isolate are styled with hatched lines.(e) Distribution of accessory metabolic functions across the major pathways.

Fig. 5 .
Fig. 5. NMDS plots of the flux sampling results from the isolates' metabolic network reconstructions colour-coded according to isolate collection sites and season, patient demographic profile and isolate morphology.

Table 1 .
Properties of the metabolic reconstructions of the P. aeruginosa clinical isolates