Comparative analysis of the volatile metabolomes of Pseudomonas aeruginosa clinical isolates

Pseudomonas aeruginosa is a nearly ubiquitous Gram-negative organism, well known to occupy a multitude of environmental niches and cause human infections at a variety of bodily sites, due to its metabolic flexibility, secondary to extensive genetic heterogeneity at the species level. Because of its dynamic metabolism and clinical importance, we sought to perform a comparative analysis on the volatile metabolome (the ‘volatilome’) produced by P. aeruginosa clinical isolates. In this study, we analyzed the headspace volatile molecules of 24 P. aeruginosa clinical isolates grown in vitro, using 2D gas chromatography time-of-flight mass spectrometry (GC  ×  GC-TOFMS). We identified 391 non-redundant compounds that we associate with the growth and metabolism of P. aeruginosa (the ‘pan-volatilome’). Of these, 70 were produced by all 24 isolates (the ‘core volatilome’), 52 by only a single isolate, and the remaining 269 volatile molecules by a subset. Sixty-five of the detected compounds could be assigned putative compound identifications, of which 43 had not previously been associated with P. aeruginosa. Using the accessory volatile molecules, we determined the inter-strain variation in the metabolomes of these isolates, clustering strains by their metabotypes. Assessing the extent of metabolomic diversity in P. aeruginosa through an analysis of the volatile molecules that it produces is a critical next step in the identification of novel diagnostic or prognostic biomarkers.


Introduction
Pseudomonas aeruginosa is a bacterium that is well known for its metabolic flexibility [1,2], enabling it to thrive in numerous environmental niches. Additionally, P. aeruginosa is a broad-host range opportunistic pathogen, infecting plants, invertebrates, and animals (including humans), and is implicated in a wide range of clinical infections including otitis media, wound infections, and acute and chronic pneumonias [3,4]. Seminal genomics analyses over the past decade attribute P. aeruginosa's capabilities to a relatively large genome (mean size of 6.6 Mbp [5]), containing an average of 6175 genes per isolate [6], as well as its disproportionately large number of regulatory elements, comprising approximately 10% of an isolate's genes [5]. More recently, several medium-and largescale P. aeruginosa genomics studies (<20 and >100 genomes, respectively) [6][7][8][9][10][11] have provided a more nuanced view of the genetic capacity of the species. In one of the largest studies to-date, Mosquera-Rendon et al analyzed 181 genomes, and identified 16 820 non-redundant genes, constituting the predicted P. aeruginosa pan-genome. Only 15% of the genes (n = 2305) were identified as core genes, defined as being present in >95% of sequenced isolates [6]. More than half were accessory genes, defined as being present in 5-95% of P. aeruginosa genomes, and the remainder (31%) consisted of rare genes, harbored by less than 5% of the strains. Therefore, genetic diversity across strains likely contributes to P. aeruginosa's metabolic diversity and niche ubiquity [6,7].
Complicating the mission to understand P. aeruginosa's flexibility is our limited ability to translate genomic data into confident predictions on niche occupancy, phenotypes, and metabolism. Several sticking points obstruct progress. First, the vast majority of P. aeruginosa genes that have been sequenced lack any known or predicted function. For example, the majority of rare genes are without functional annotations in the KEGG database [6], and Ozer and colleagues reported that nearly 40% of P. aeruginosa core genes are 'poorly characterized', without any known or predicted function. Second, as little as 17% of an individual isolate's genome consists of core genes [7], and the remainder is highly variable from one isolate to another [6,8,9]. Transcriptional analyses of the accessory genes from 150 P. aeruginosa clinical isolates revealed that sub-groups of strains share only small sets of accessory genes, and no large common gene clusters were identifiable [8]. Therefore, bottom-up approaches to P. aeruginosa metabolic network discovery (i.e., using genomes to predict metabolomes) generates limited information. More directly, in spite of previous studies, we know only a fraction of the P. aeruginosa metabolome [2,[12][13][14], particularly secondary metabolites, hindering top-down approaches (using metabolites to predict genetic pathways) to identify unknown metabolism-associated genes and their interactions.
A more thorough understanding of the P. aeruginosa metabolome could have significant positive impact on the clinical management of this pathogen [15,16]. In particular, identifying the complete metabolic repertoire of clinical strains-or the clinical pan-metabolome-could shed light on mechanisms by which this organism establishes disease or evades therapy. Characterizing the core metabolome could lead to new therapeutic targets as well as the identification of biomarkers for highly sensitive diagnostics of any P. aeruginosa infection, regardless of strain or infection site. Conversely, the accessory metabolome could yield biomarkers that are specific to clinically-relevant phenotypes, such as virulence, antibiotic resistance, or mucoidy. Rare or unique metabolites could be used as sentinels or tracers of epidemic strains.
We and others have been studying the P. aeruginosa volatile metabolome, or 'volatilome', yielding approximately 125 unique volatile metabolites for this species via analysis of roughly 100 unique strains . We hypothesize that, akin to the characteristics of the pangenome, there is a core volatilome for P. aeruginosa clinical strains that constitutes a small fraction of the full volatilome of this species. We analyzed the volatile molecules from 24 isolates, cultured in vitro, using comprehensive 2D gas chromatography-time-of-flight mass spectrometry (GC × GC-TOFMS). We detected nearly 400 volatile compounds that contribute to the P. aeruginosa pan-volatilome, and identified 70 metabolites that are core to these clinical strains under the conditions tested here, 43 of which have not been previously identified as part of the P. aeruginosa volatile metabolome. In addition, we identified a total of 321 accessory and rare volatile molecules and used these data to assess the inter-strain variability in their volatile profiles, and to estimate the size of the pan-volatilome and core volatilome under these conditions.

Bacterial isolates and sample preparation
Twenty-four P. aeruginosa clinical isolates from infections of the eye (n = 6), ear (n = 1), throat (n = 1), respiratory tract (n = 7), abdomen (n = 3), urinary tract (n = 1), blood (n = 1), and skin (n = 4) were cultured aerobically at 37 °C in LB-Lennox (10 g tryptone, 5 g yeast extract, 5 g NaCl per liter). After 24 h of growth, the bacterial cultures were cooled on ice, cells pelleted via centrifugation, and the supernatants filtered through 0.22 µm PES membranes to remove all remaining cells. Five milliliters of supernatant and a stir bar were sealed into 20 ml GC headspace vials with PTFE/silicone caps. Three biological replicates were prepared for each isolate. Six LB-Lennox sterile media controls were prepared in parallel through all steps of sample processing. All samples were stored at −20 °C prior to analysis.

Volatile molecule collection and analysis
Culture supernatants were stirred and heated to 50 °C and the headspace was sampled for 30 min via solid phase microextraction (SPME) using a 2 cm triphase fiber (divinylbenezene/carboxen/polydimethylsiloxane, 50/30 µm; Supelco/Sigma-Aldrich, St. Louis, MO). The volatile molecules were desorbed for 180 s at 270 °C, and injected splitlessly onto a comprehensive 2D gas chromatograph coupled with a time-of-flight mass spectrometer (GC × GC-TOFMS; Pegasus 4D, LECO Corp., St. Joseph, MI), equipped with an autosampler (Multipurpose Sampler, Gerstel, Inc., Linthicum Heights, MD). Separation of volatile molecules was achieved using a 2D column set consisting of an Rxi-624Sil (60 m × 250 µm × 1.4 µm; length × internal diameter × film thickness; Restek, Bellefonte, PA) first column and a Stabilwax (1 m × 250 µm × 0.5 µm; Restek) second column, joined by a press-fit connection. The primary oven (containing the Rxi-624Sil column) was initiated at 35 °C (0.5 min hold) and ramped to 230 °C (5 min hold) at 5 °C min −1 . The secondary oven (containing the Stabilwax column) and the quad-jet modulator were heated in step with the primary oven, with a +5 °C and +30 °C offset, respectively, relative to the primary oven temperature. A 2 s modulation period was used, with 0.5 s alternating hot and cold pulses. The helium carrier gas flow rate was 2 ml min −1 . The mass spectrometer transfer line was heated to 250 °C. Mass spectra were acquired over the range of 30-500 m/z at a rate of 200 Hz. Retention indices (RI) for the sample volatile molecules were calculated using external alkane standards (C 6 -C 15 ). Headspace volatile molecules of a pure retention index mixture (Sigma-Aldrich, St. Louis, MO) were sampled using the 2 cm triphase SPME fiber for 10 min at 50 °C and desorbed at a 30:1 split. RIs for compounds eluting prior to hexane (C 6 ) or after pentadecane (C 15 ) were extrapolated.

Data processing and analysis
Data collection, processing, and alignment were performed using ChromaTOF and the Statistical Compare software package, version 4.50 (Leco Corp.). The baseline was drawn through the middle of the noise and the signal-to-noise (S/N) cutoff for initial peak finding was set to 100 for a minimum of two apexing masses. Subpeaks were combined when the second dimension retention time shift was ⩽0.1 s early for subsequent modulation periods and the mass spectral match score was ⩾600. For aligning peaks across chromatograms, maximum retention time shifts of 2 s (one modulation period) in the first dimension and 0.2 s in the second dimension were allowed, and minimum mass spectral similarity matches of 600 were required. A second round of peak discovery was performed on the aligned chromatograms, adding peaks with S/N ⩾ 10 if that same peak was present in at least one chromatogram at S/N ⩾ 100. All peaks that were observed in at least three chromatograms were included in the peak table for further analysis.
Peaks were assigned putative identifications based on mass spectral and retention time data, and the quality of the identifications were classified (levels 1-4) following published guidelines, with level 2 being the highest classification in this study [38]. Level 2 compounds were identified based on ⩾85% mass spectral match using forward searches of the NIST 2011 mass spectral library and also possessed corroborating retention time data in one of two forms: (1) a retention time that fits with the homologous chemical series, quantified by a linear fit (R 2 > 0.998) of retention time versus carbon number (used to confirm identities of the 2-, 3-, and 4-ketones), or (2) experimentally-determined retention indices that are consistent with the midpolar Rxi-624Sil stationary phase, quantified using published median RIs for nonpolar and polar columns, and the following equation: Level 3 compounds were identified on ⩾85% mass spectral match to the NIST 2011 library. Level 4 compounds have mass spectral matches <85%, but can still be differentiated based upon mass spectral data. Chemical classifications assigned to level 4 compounds are based upon characteristic mass fragments or patterns of neutral loss.

Statistical analyses
All statistical analyses were performed using R version 3.2.2. Peaks eluting before 358 s were removed from the peak tables, and the relative abundances of the remaining compounds were normalized across chromatograms using Probabilistic Quotient Normalization [39]. Intraclass correlations (ICC) were calculated for the 24 P. aeruginosa strains, and peaks with: (1) ICC > 0.4, and (2) greater mean intensities in samples versus sterile media controls (including zeros for blanks, but excluding zeros for samples) were retained for further analysis. For peaks that were detected in both the samples and the sterile media controls, the statistical significance of the differences between sample and control peak intensities were determined using the one-tailed non-parametric Mann-Whitney U-test [40] with Benjamini-Hochberg correction [41], with a significance threshold of p < 0.05 selected. For peaks that had high strain-to-strain variability within the P. aeruginosa samples (yielding large U-test p-values), an ICC > 0.8 was used as an alternative measure of statistical significance. Only peaks that were either (1) significantly more abundant in samples relative to sterile media, or (2) not detected in any sterile media samples are reported, and these peaks constitute the pan-volatilome described in this study. Peaks were defined as absent, detected, or present for each clinical isolate of P. aeruginosa if they were observed in 0, 1-2, or all 3 biological replicates, respectively. Using these definitions, volatile molecules were classified as core, rare, or accessory metabolites according to the following criteria: • core: present in all 24 clinical isolates; • rare: present in only one clinical isolate and detected in ⩽ 6 of the remaining 23 isolates; • accessory: present in 2-23 clinical isolates or present in one isolate and detected in >6 of the remaining 23 isolates.
For inter-strain metabolomics comparisons, the average peak intensity of each compound in a given isolate was calculated as the arithmetic mean of the three biological replicates, and an arbitrarily small value of 1 was assigned for compounds that were not detected. Average peak intensities were (1) log-transformed, (2) mean-centered, and (3) unit-scaled. Mean-centering and unit-scaling were performed for each compound individually. Hierarchical clustering analysis (HCA) was employed to assess the metabolic relatedness of isolates using all volatile molecules belonging to the accessory suite, with a Euclidean distance metric.
Accumulation and rarefaction curves were created using a methodology previously described by Humbert and colleagues for the analysis of genomic data [42]. A binary scale was generated to describe the occurrence of volatile metabolites in each isolate (0 if absent or detected, 1 if present). A total of 500 iterations were performed to generate both the accumulation and rarefaction curves, using a subset of data selected at random and without replacement.

Results and discussion
The volatile metabolome of P. aeruginosa clinical isolates grown aerobically in rich medium From the collection, processing, and alignment of 78 GC × GC chromatograms-72 P. aeruginosa samples and 6 sterile medium controls-we identified 1752 non-redundant peaks that were observed in at least 3 chromatographic analyses. Through statistical comparisons between the samples and controls (see data in supplementary table S1, available at: stacks. iop.org/JBR/10/047102/mmedia), we conservatively attribute 391 peaks to the P. aeruginosa pan-volatilome (table S1) and identified 70 volatile molecules that were present in all three biological replicates of all 24 isolates, which we defined as the core volatilome (figure 1 and table S1). Two hundred and sixty-nine volatile molecules were present in a subset of isolates, deemed the accessory volatilome, and 52 were produced by only single clinical isolates in the study, which we call the rare volatilome (table S1). Based on mass spectral fragmentation patterns and 2nd dimension retention times ( 2 t R ) [43], chemical class assignments could be determined for a subset of the volatile molecules within the core (n = 36), accessory (n = 54), and rare (n = 5) volatilomes (figure 2).
Ketones were the most highly-represented class (38 volatile molecules of 95, 40% of the total), which recapitulates several previous studies that identified ketones as key constituents of the P. aeruginosa volatilome [18,26,32]. Hydrocarbons were next highest, constituting 28% of all compounds for which we could determine chemical class, while the remaining five classes (alcohols, aldehydes, esters, aromatics, and others), accounted for only 29% combined. Identifying chemical classes by electron impact mass spectrometry and GC × GC retention times is easier for some classes than others, which introduced bias into our analysis. For example, ketones and hydrocarbons are readily identifiable by the combination of their base ions of m/z = 43, 57, and 71 and their second dimension retention times ( 2 t R = 0.80 s ± 0.12 s and 0.65 s ± 0.03 s, respectively; table S1) making these compound classes easy to detect via extracted ion chromatograms (XICs) and GC × GC retention time patterns. Alcohols, in contrast, are characterized by the neutral loss of water (i.e. ions in the mass spectra that differ by m/z = 18), which cannot be visualized using XICs, and these compounds do not have narrowly-defined 2 t R , like the hydrocarbons (table S1). This bias, in combination with the incompleteness of mass spectral and retention index libraries for many metabolites, makes it impossible to draw conclusions about differences in the variety of volatile compounds observed between the core, accessory, and rare metabolomes. However, it is interesting to note that ketones are well represented across the three volatilome categories (figure 2), suggesting a variety of roles for these compounds in P. aeruginosa diagnostics, with core ketones contributing to a suite of species-level biomarkers, as proposed by Shestivska et al [32], and combinations of accessory or rare ketones serving as unique strain identifiers.

Volatile molecules associated with the in vitro growth of P. aeruginosa clinical isolates
To provide putative identifications for the P. aeruginosa volatile molecules that we observed, we compared both mass spectral and retention time data to published   or experimental values, when available (see materials and methods for details). Amongst the 391 peaks that we associate with the growth and metabolism of P. aeruginosa clinical isolates, we were able to assign putative identifications to 65 (17%) using a stringent threshold for mass spectral matching (⩾850/1000 relative to the NIST 2011 library), with corroboration via retention time data for a subset of compounds (table 1). Of these 65 compounds, 25 belong to the core volatilome, and are included in figure 1, 36 are in the accessory volatilome, and 4 in the rare volatilome. We identified several oft-reported P. aeruginosa-associated volatile compounds in the core volatilome, such the dimethyl sulfurous compounds (dimethyl sulfide, disulfide, and trisulfide; table 1) [17,18,[25][26][27][28][29][30][31][32][33][34]36]. P. aeruginosa's highly characteristic odor compound, 2-aminoacetophenone [18,20,26,29,32], was present in 23 of the isolates in this study and absent from one (Urinary-1) making it part of the accessory metabolome. We identified eight 2-ketones in the core and accessory volatilomes, four each of the 3-ketones and 4-ketones, and several additional branched ketones in these chemical families (table 1). Most of the 2-ketones have been previously reported [17,18,21,22,25,26,[31][32][33][34][35]37], but we identified 2-decanone and several of the 3-and 4-ketones for the first time.

Characterizing P. aeruginosa by its volatilome
Employing definitions analogous to those used for comparative genomics, we designate the panvolatilome as including any volatile molecule produced by this set of clinical isolates, whereas the core volatilome includes the set that is shared across all isolates. Using the individual metabolome data for each strain, we estimated the sizes of the pan-volatilome and core volatilome for these 24 P. aeruginosa clinical isolates via accumulation and rarefaction curves, respectively (figure 3). Both curves approach asymptotes (figure 3), indicating that the rate of discovery of new volatiles, and the rate of restraining the core volatilome is not unbounded in this set of experiments [44]. That is, a suitable number of isolates have been included in this study to capture the variation in the P. aeruginosa volatilome under these growth conditions and analytical methodology, and the addition of more strains is unlikely to significantly change the estimated core and pan-volatilome sizes, which stand at 70 and 391 compounds, respectively. If either curve was unbounded, that would indicate that the data we have   [38]; 1 t R , 2 t R = 1st and 2nd dimension retention times, respectively; RI = retention index. collected is not sufficient for size estimations, and that more strains would be required to characterize the volatilome in this study [44]. However, we note that rare volatile molecules are underrepresented in this data set. Similar to the impact of rare genes on pan-genome estimates [6,44], these compounds are expected to drive the upper-bounds of the accumulation curve, which plateau early in our analysis, while not necessarily strongly affecting the mean. We took a very conservative approach in the identification of rare volatile molecules, including only the rare peaks that were (1) present in all three biological replicates and (2) had low intrastrain versus inter-strain variability (ICC > 0.8). This approach disadvantaged the identification of compounds that were detected near the signal-to-noise threshold, which are susceptible to being reported as undetected in some replicates.
The size of the pan-volatilome in this study indicates that the ~125 volatile molecules that have been previously recorded for P. aeruginosa are but a fraction of the species' volatile metabolome. And, our data on 24 isolates will necessarily underestimate the clinical panvolatilome. The accumulation and rarefaction curves used to quantify a 'pan-genome' and 'core genome' assume a genetic diversity that is representative of the species as a whole, and the same assumptions are made when evaluating the core and pan-volatilome. Because the underlying genetic characteristics of these isolates is presently unknown, we cannot verify the isolates used here are representative of P. aeruginosa clinical isolates more broadly. Additionally, the pan-genome studies of P. aeruginosa indicate that many dozens of strains should be included for completeness [6,7], and that these numbers can prove insufficient if there are dis- Pan-volatilome size Core volatilome size Figure 3. Estimates of the pan-volatilome and core volatilome for 24 P. aeruginosa clinical isolates grown in vitro in LB-Lennox under aerobic conditions. The accumulation curve (green) and rarefaction curve (blue) indicate the number of total volatile metabolites and shared volatile metabolites contributed by all clinical isolates of P. aeruginosa used in this study, respectively. A binary scale was generated to describe the occurrence of volatile metabolites in each isolate (0 if absent or detected, 1 if present), and both curves were generated using this scale. A total of 500 iterations were performed to generate both the accumulation and rarefaction curves, using a subset of data selected at random and without replacement. Figure 4. Heat map depicting the relative abundance of all P. aeruginosa accessory volatile metabolites as a function of strain. Each column represents a volatile molecule from the accessory metabolome, and each row represents a single P. aeruginosa clinical isolate, with colors depicting the relative metabolite abundance, calculated after log-transformation, mean-centering, and unit-scaling of peak intensities. Blue = low relative abundance, red = high relative abundance. Dendrogram (left) depicts the relatedness amongst strains as a function of their volatile metabolic profiles, using Euclidean distance. UTI = urinary tract infection; RESP = respiratory infection; ABD = abdominal; THR = throat; LB = sterile LB medium control. tinctive subsets of clinical isolates that are underrepresented in the dataset [7,44].

The inter-strain variability of the clinical P. aeruginosa volatilome
The accumulation curve is largely shaped by the accessory volatiles in this study-i.e., the volatile molecules shared by subsets of the isolatesillustrating that these compounds capture the variation in the volatilomes of the 24 clinical isolates. Using only the accessory volatile molecules for the analysis, we quantified the inter-strain variability and relatedness of the isolates. We hypothesized that isolates from related types of infection would be more metabolically-similar and would cluster together. The peak intensities for each accessory volatile were mean-centered and unitscaled across all 24 clinical isolates, and visualized with a heat map (figure 4). Using these data, we calculated the metabolic relatedness amongst strains, depicted via a dendrogram with branch length inversely proportional to metabolic similarity (i.e., shorter branch lengths indicate more metabolically-similar isolates). Interestingly, four of the six eye infection isolates (Eye-1, -4, -5, -6) cluster together based on a few highly similar blocks of volatile metabolites. This hints at the ability to use the accessory volatilome as a source of biomarkers for characterizing P. aeruginosa clinical strains linked to this specific site of infection, but the rest of the cluster analysis data indicate that type of information may not be generalizable to all infection sites. We do not, however, have sufficient genomic data on these isolates to determine the underlying source of the metabolic relatedness of the eye infection isolates under these growth conditions, to identify the reasons why most isolates from the same infection sites do not cluster, or to explore why the metabolome of respiratory-1 (RESP-1) exhibits greater similarity to sterile LB medium than to any of the other isolates studied. Additionally, we hypothesize that the growth conditions will dictate the infection site-specific biomarkers we could identify, and studies are underway to test this hypothesis.
A previous study by Shestivska et al explored the relationship between the volatile metabolomes and genomes of P. aeruginosa clinical isolates, and found that none existed [32]. However, that study was limited in both genomics and metabolomics data, utilizing the sequences of seven housekeeping genes and the relative abundances of six conserved metabolites to characterize the relatedness of 36 isolates. Based upon the sizes and variation of the pan-genome and pan-volatilome of P. aeruginosa, we hypothesize that correlations between genomes and metabolomes will be uncovered, but that much larger datasets and sample sizes will be required. We posit that metabolomic clustering of the eye infection isolates could also reflect shared phenotypes that are common amongst these types of infections, and the relationships between P. aeruginosa genotypes, phenotypes, and metabotypes will be topics of future studies.

Conclusions
To the best of our knowledge, we report here the most comprehensive study to-date to identify volatile metabolites produced by P. aeruginosa clinical isolates. Utilizing concepts from the field of comparative genomics, we performed a comparative analysis of the volatilomes of 24 P. aeruginosa clinical isolates grown aerobically in rich media, identifying 391 unique compounds in the pan-volatilome, of which 70 compounds are core to all of the clinical isolates in this study. As reported previously, P. aeruginosa produces a wide array of ketone compounds, and the use of GC × GC-TOFMS in this study allowed us to putatively identify 17 novel ketones in the P. aeruginosa volatile repertoire, as well as 26 volatiles in other chemical classes. Two-thirds of the volatiles we identified were categorized as accessory volatiles, present in a subset of the P. aeruginosa isolates we characterized, which drives the variation in the volatilomes of these strains. We observed very little similarity in the volatile metabolomes of strains from related infection sites, but future investigations of P. aeruginosa clinical isolates will explore how genotype and phenotype relate to the metabotype, as well as the influence of in vitro growth conditions on the identification of volatile biomarkers for P. aeruginosa infections.
In the characterization of the P. aeruginosa volatilome, there is much more to be done. Metabolomes (as with transcriptomes, proteomes) are conditional, and therefore complete characterization of the panmetabolome requires the exploration of many different strains and environmental conditions, including interactions with other species, which can modulate the P. aeruginosa metabolome [34,[45][46][47][48][49]. In light of these challenges, the pursuit of identifying and characterizing the P. aeruginosa pan-volatilome is a tall order, yet the data collected towards its completion will be valuable in the development of new diagnostic tools, therapeutic interventions, and basic insight into the metabolism of this pervasive opportunistic pathogen. This study represents a step in that direction.
in this publication was supported by The Dartmouth Clinical and Translational Science Institute, under award number UL1TR001086 from the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH). The content is solely the responsibility of the author(s) and does not necessarily represent the official views of the NIH.