Do Metabolomics and Taxonomic Barcode Markers Tell the Same Story about the Evolution of Saccharomyces sensu stricto Complex in Fermentative Environments?

Yeast taxonomy was introduced based on the idea that physiological properties would help discriminate species, thus assuming a strong link between physiology and taxonomy. However, the instability of physiological characteristics within species configured them as not ideal markers for species delimitation, shading the importance of physiology and paving the way to the DNA-based taxonomy. The hypothesis of reconnecting taxonomy with specific traits from phylogenies has been successfully explored for Bacteria and Archaea, suggesting that a similar route can be traveled for yeasts. In this framework, thirteen single copy loci were used to investigate the predictability of complex Fourier Transform InfaRed spectroscopy (FTIR) and High-performance Liquid Chromatography–Mass Spectrometry (LC-MS) profiles of the four historical species of the Saccharomyces sensu stricto group, both on resting cells and under short-term ethanol stress. Our data show a significant connection between the taxonomy and physiology of these strains. Eight markers out of the thirteen tested displayed high correlation values with LC-MS profiles of cells in resting condition, confirming the low efficacy of FTIR in the identification of strains of closely related species. Conversely, most genetic markers displayed increasing trends of correlation with FTIR profiles as the ethanol concentration increased, according to their role in the cellular response to different type of stress.


Introduction
Yeast taxonomy debuted at the beginning of the 20th century with a monography based on morphology and the analysis of a few physiological traits that were eventually increased by the Dutch School to the over 70 traits suggested in the various editions of "The Yeasts-a taxonomic study" [1][2][3]. The idea at that time was simply that the assimilation and fermentation properties would help discriminated species, thus assuming a strong link between physiology and taxonomy. Some milestones papers show that these characteristics can be instable and vary within the species, thus suggesting that they were not ideal markers for species delimitation and that the species is not a boundary of unique traits combinations. These observations helped molecular taxonomy to take

Cultures and Growth Conditions
The yeast strains used in this work are: S. bayanus CBS 380, S. paradoxus CBS 432, S. cerevisiae CBS 1171 and S. pastorianus CBS 1538. Cultures were inoculated at an optical density at 600 nm (OD 600 ) of 0.2 in 500-mL bottles with 50 mL of fresh Yeast Nitrogen Base (YNB) medium supplemented with 5% glucose (Difco Laboratories, Franklin Lakes, NJ, USA) and grown at 25 ± 0.5 • C under shaking at 150 ± 1 rpm. Cell growth was monitored by determining OD 600 and stopped after 18 h. Each culture was prepared for an FTIR-based bioassay, as detailed in the following paragraph.

FTIR-Based Bioassay
An FTIR-based assay for stress response analysis was carried out according to the procedure proposed by Corte and colleagues [18]. Each suspension was centrifuged (5 min at 5300 ± 10× g), washed twice with distilled sterile water and re-suspended in High Performance Liquid Chromatography (HPLC) grade water to obtain an optical density of OD 600 = 50. Each cell suspension was distributed in 15-mL polypropylene tubes, one for each tested concentration of the chemicals. In each tube, 5 mL of cell suspension and 5 mL of double-concentrate ethanol solution were pipetted to obtain the final concentrations of 8%, 12% and 16% (v/v) and a uniform cell density at OD 600 = 25. Control (0% ethanol concentration) was obtained by re-suspending cells in distilled sterile water. All tests were carried out in triplicate. Polypropylene tubes were incubated for 1 h at 25 • C in a shaking incubator set at 50 ± 1 rpm. After the incubation, 1.5 mL of each cell suspension were centrifuged (5 min at 5300 ± 10× g), washed three times with distilled sterile water and resuspended in HPLC grade water to reach the final concentration of 2.5 × 10 8 cells mL −1 . For each condition, 105 µL volume was sampled for three independent FTIR readings (35 µL each, according to the technique suggested by Essendoubi and colleagues [24] while 100 µL were serially diluted for viability assessment. Viable cell count was carried out on YPDA (Yeast extract 1%, Peptone 1%, Dextrose 2%, Agar 1.8%) supplemented with chloramphenicol (0.5 g L −1 ) plates. Cell mortality (M) was calculated as M = (1 − Cv/Ct) × 100, where Cv is the number of viable cells in the tested sample and Ct the number of viable cells in the control suspension.

Spectra Pre-Processing
FTIR measurements were performed in transmission mode. All spectra were recorded in the range between 4000 and 400 cm −1 . Spectral resolution was set at 4 cm −1 , sampling 256 scans per sample to obtain high quality spectra (signal to noise ratio values greater than 4000 within the 2100-1900 cm −1 interval). The software OPUS v. 6.5 (BRUKER Optics GmbH, Ettlingen, Germany) was used to assess the quality test, subtract the interference of atmospheric CO 2 and water vapor, correct baseline (rubberband method with 64 points) and apply vector normalization to the whole spectra.

Untargeted Metabolomics Profile Determination of Yeast Cells by LC-MS Analysis
The metabolomic analysis of CBS 380, CBS 432, CBS 1171 and CBS 1538 samples at increasing ethanol concentrations (0%, 8%, 12% and 16%, v/v) was performed using a LC-MS untargeted workflow. Each thesis was tested in five replicates (n = 5), with a total of 80 samples processed. Cell suspensions, calibrated at 10 8 cells, was centrifuged (5 min at 5300 ± 10× g) and the resulting pellet was mixed with glass beads and lysed using FastPrep ® -24 Tissue and Cell Homogenizer (MP Biomedicals, Irvine, CA, USA), at a speed setting of 6.0 for 120 s. The degree of cell breakage was checked microscopically. One milliliter of methanol was added to each lysate, vortexed and centrifuged at 3000 rpm for 5 min. Supernatants were transferred to the HPLC vials and 0.5 µL was injected into the LC-MS system. LC-MS analyses were performed using an Agilent 1260 Infinity UHPLC system coupled to an Agilent 6530 Q-TOF with Agilent JetStream source (Agilent Technologies, Santa Clara, CA, USA). The LC consists of a quaternary pump and an autosampler with a thermostated column compartment. The whole LC-MS system was governed by Agilent MassHunter software (v. B.09.00). The Ion Pairing Chromatography (IPC) method was used to achieve a wide separation of polar metabolite classes with ACME TM Amide C18 column (150 × 2.1 mm, 3 µm, Phase Analytical Technology LLC, State College, PA, USA) thermostated at 50 • C. The separation of metabolites was achieved using a flow of 0.35 mL min −1 of a binary gradient of 0.2% heptafluorobutyric acid (HFBA) in water (Solvent A) and 0.1% formic acid in methanol (Solvent B). The initial condition was 2% of B for 2 min, followed by a transition to gradient from 2% to 80% of B in 5 min and an isocratic step of 8 min. After that, the run was stopped, and the column was reconditioned for 4 min at initial conditions. An autosampler injected each sample using a needle wash program of 10 sec with methanol. Each run cycle was completed in 20 min. The ion source operated in positive ion mode using nitrogen as drying gas at 35 psi and 250 • C. The capillary was set at 2000 V with fragmentor, skimmer and octopole Radio Frequency (RF) set at 110, 65 and 750, respectively. Dynamic mass axis calibration with accuracy < 5 ppm was achieved by continuous infusion in the source of a reference mass solution (Agilent G1969-85001). The spectrometer acquired data in full-scan mode in the 50-1700 mass range at 1.5 spectra/s. LC-MS raw files were aligned and processed using Batch Recursive Feature Extraction algorithm of MassHunter Profinder (Agilent B.08.00). The data of features with score > 90% were imported in Mass Profiler Software (Agilent B.08.01) to perform features annotation using the Search Database algorithm. For this purpose, the Yeast Metabolome Database [25] was adapted to work in Agilent Mass Profiler. Only annotated metabolites with a quality identification score > 90% were retained.

Phylogenetic Analysis
Alignment of the concatenated ITS and D1/D2 domain of the 26S rDNA (LSU) sequences of the four strains was carried out with ClustalW2 built-in tool in MEGA X [26]. Distances were inferred with the Maximum Composite Likelihood method and expressed as number of base substitutions per site. This procedure has been chosen because it assumes equal substitution patterns and rates among lineages and sites, conditions considered appropriate for ongoing separation phenomena. Both transitions and transversions were considered. The Neighbor-Joining method [27] was used to reconstruct the tree with 1000 bootstrap reiterations.

Correlation between Genetic Markers and Metabolomic Data
Correlation analysis between genetic and metabolomic markers was performed using a series of thirteen different loci which included the traditional ITS and LSU rRNA-based markers and other single copy genes suggested as new generation markers (Table 1).
For each strain, marker sequences were retrieved from whole genome assemblies using an ad-hoc pipeline, set up using freeware bioinformatic tools, as detailed in the following lines.
For each marker, reference FASTA sequences from S. cerevisiae S288C reference genome (obtained from Genbank) were aligned to each genome assembly and the coordinates of regions with positive matchings were retrieved using NUCmer function of MUMmer software [28,29]. Markers sequences were then extracted as FASTA files from these positive matchings using SAMtools software [30].
Markers alignments were carried out with ClustalW2 built-in tool in MEGA X [26]. Distances were calculated in R environment (http://www.R-project.org) using dist.dna function from APE R-package (http://ape.mpl.ird.fr/). Correlation analysis between genotypic and metabolomic distance matrices from cells in resting condition was carried out by using the function cor.test included in the Vegan package (https://CRAN.R-project.org/package=vegan). Data were then exported and analyzed in MS Excel TM . Correlation analysis between genetic vs. FTIR and LC-MS data from cells under stress was carried out by using the dis.maca function from the R script DADI [31]. Briefly, distance matrices of strains from each genetic marker were correlated with distance matrices from the first descriptor of each spectral data matrix, applying the default parameters (Euclidean distances and Pearson correlation calculation). The procedure was repeated for each descriptor of all spectral matrices. Correlation matrices were then exported and filtered in MS Excel TM . Correlations with FTIR data were analyzed considering both the whole IR spectrum and each of the characteristic spectral region from W1 to W4 [32,33].

HCA and Pathway Analyses of LC-MS Data
The sample weight and non-normalized peak areas are available in Table S1. Metabolomic data analysis was performed with MetaboAnalyst 4.0 [34]. Data were filtered based on Interquartile Range, normalized to sample median and scaled by Pareto scaling.
HCA was carried out to highlight the variation trend of differential metabolites between strains challenged by increasing ethanol concentrations setting the Euclidean correlation method and Ward clustering algorithm. Metabolites were selected according to the VIP scores from Partial Least Square-Discriminant Analysis (PLS-DA) (Tables S2-S5). Since the average of squared VIP score is equal to 1, the "greater than one" rule was used as a criterion for variable selection.
Pathway analysis was performed to discover the most relative pathways involved in the response of these strains to short-term ethanol stress (Tables S6-S9). The global test algorithm and relative-betweenness centrality algorithm were specified for pathway enrichment analysis and pathway topology analysis, respectively.

Phylogenetic Predictability: Correlation between DNA-Based and Phenotypic Markers
To assess the relation between metabolomics and DNA taxonomic markers, FTIR and LC-MS were employed for the former analyses and a set of thirteen loci of taxonomic use for the latter (Table 1, M&M section). The choice of these markers was guided by the following criteria: the ITS, LSU and the concatenated ITS_LSU sequences were included as benchmark since they were used over the last 22 years as identification tool [2,7]; DAL2, FAS1 and ICL1 single copy genes were selected as genes involved in crucial metabolic pathways [35][36][37][38]; and ACT1, mtCOXII, mtSSU, RPB1, RPB2, SSU and TEF1-α sequences were selected because they are currently proposed as new generation markers for yeast taxonomy [7,[39][40][41][42].
Out of these 13 loci, the concatenated sequence ITS_LSU was chosen for the preliminary investigation on the phylogenetic predictability of metabolomics profiles, whereas the whole set of markers was used for a second step analysis described below. The phenotypic analysis suffers traditionally of repeatability problems and is obviously sensitive to the growth phase at which cells are analyzed. In this respect, late exponential phase was chosen, because it can be easily detected from the growth curve.
Interestingly, trees from cluster analysis of metabolomic data substantially reproduced the same clusterization of the strains obtained through their phylogenetic description with ITS_LSU barcode ( Figure 1A). In fact, with both FTIR fingerprint ( Figure 1B) and LC-MS metabolomics ( Figure 1C) S. bayanus CBS 380 and S. pastorianus CBS 1538 clustered separately from S. paradoxus CBS 432 and S. cerevisiae CBS 1171. However, the distances among the strains provided by the ITS-LSU clustering were significantly different from those indicated by the phenotypic markers.
were significantly different from those indicated by the phenotypic markers.
The substantial identity among S. paradoxus CBS 1538 and S. bayanus CBS 380, as for the ITS_LSU analysis, was not confirmed by means of FTIR and LC-MS, showing an increased distance among these strains, particularly marked in the LC-MS data cluster. Conversely, the distance between S. paradoxus CBS 432 and S. cerevisiae CBS 1171 was roughly the same when comparing ITS_LSU and FTIR clusters while it was reduced by the LC-MS description.  The substantial identity among S. paradoxus CBS 1538 and S. bayanus CBS 380, as for the ITS_LSU analysis, was not confirmed by means of FTIR and LC-MS, showing an increased distance among these strains, particularly marked in the LC-MS data cluster. Conversely, the distance between S. paradoxus CBS 432 and S. cerevisiae CBS 1171 was roughly the same when comparing ITS_LSU and FTIR clusters while it was reduced by the LC-MS description.
The discrepancies among the qualitative and quantitative representation of strains depicted in Figure 1 suggested to deepen the study by using the whole set of taxonomic markers (Table 1) to better evaluate if and to what extent the DNA-based markers can be effective in predicting the phenotypic profiles of these strains.
Distance matrices from FTIR and LC-MS spectra of each strain were correlated with those obtained for each locus included in the study ( Figure 2).
Overall, the correlation between DNA-based and FTIR data (Figure 2A) was lower than that obtained for LC-MS ( Figure 2B).
No significant correlations (correlation coefficients > 0.75) was detected among FTIR and taxonomic markers ( Figure 2A). Conversely, eight markers out of the thirteen tested showed correlation coefficients higher than 0.75 with distance matrices from LC-MS spectra ( Figure 2B). The mitochondrial subunit 2 of cytochrome oxidase (mtCOXII) gene was the marker that correlated best with the LC-MS data (0.97; p value < 0.01). High correlation values (≥0.9) were also registered for the 18S nuclear ribosomal small subunit (SSU) and the Translation Elongation Factor 1 alpha (TEF1α) sequences, with 0.91 and 0.90 coefficients, respectively (p values < 0.01). In addition, the FAS1 (Beta subunit of fatty acid synthetase) and the small subunit mitochondrial rRNA (mtSSU) genes have proved to be good markers towards LC-MS profiles of the four strains (0.88 and 0.76 correlation coefficients, respectively; p values < 0.01). Interestingly, results obtained with traditional markers revealed that the ITS barcode showed a better correlation with these phenotypic data than the concatenated sequence ITS_LSU, which in turn displayed a better correlation than the LSU alone (0.89, 0.88, 0.85 correlation coefficients; p values < 0.01).
Finally, the analysis did not reveal any significant correlation between RPB1, RPB2, ICL1 and ACT1 coding sequences and these metabolomic phenotypic descriptors. mixed region). (C) Clustering obtained analyzing LC-MS spectra using Spearman's distance measure and Ward's algorithm. Hierarchical clustering of phenotypes was performed.
The discrepancies among the qualitative and quantitative representation of strains depicted in Figure 1 suggested to deepen the study by using the whole set of taxonomic markers (Table 1) to better evaluate if and to what extent the DNA-based markers can be effective in predicting the phenotypic profiles of these strains.
Distance matrices from FTIR and LC-MS spectra of each strain were correlated with those obtained for each locus included in the study ( Figure 2).
Overall, the correlation between DNA-based and FTIR data (Figure 2A) was lower than that obtained for LC-MS ( Figure 2B).

Stress Predictivity
The potential applicability of some of the mentioned above markers for the prediction of strains phenotypes in resting condition paved the way for another question on whether this set of taxonomic loci could be used to predict the phenotypes under stress. This hypothesis springs from the observation that these species evolved with a stringent selection due to the highly challenging environments with high alcohol content. At this aim, the genetic distances between the four strains, calculated for each of the thirteen markers employed in the study (Table 1), were compared with distances obtained from FTIR spectra of cells in response to short-term ethanol stress. More in detail, a FTIR-based assay, specifically designed for stress response assessment [18], was carried out on the whole cells of each culture at increasing ethanol concentrations (0%, 8%, 12% and 16% v/v), paralleling with viable analysis.

Mortality Analysis
The analysis of cell mortality showed a strain-specific pattern in response to ethanol ( Table 2). S. bayanus CBS 380 was the most sensitive strain with around 40% mortality already at the lowest ethanol concentration tested (8%), almost saturated at 12% ethanol. Conversely, ethanol did not significantly affect the viability of S. pastorianus CBS 1538 until 12%, which in any case remained over 90% at higher ethanol concentrations, suggesting a strong resistance for this strain. The mortality induced by ethanol on the other two strains, S. paradoxus CBS 432 and S. cerevisiae 1171, is close to that of S. pastorianus CBS 1538. In fact, the first displayed mortality values ranging from 11% to 13% for all the tested concentrations while the latter showed mortality values around 10% and 20% at 12% and 16% ethanol, respectively.

Correlation between Molecular Markers and FTIR Metabolomic Fingerprints of Cells under Short-Term Ethanol Stress
The comparison between genetic data and FTIR profiles of ethanol stressed cells was carried out calculating Pearson's correlations between the distances obtained considering each of the thirteen genetic markers and those obtained comparing each of wavelength of each spectral matrix. The resulting correlation matrices were filtered, in order to retain only the wavelength with a correlation higher than 0.75 (Table S10), and then expressed as percentage on the total values obtained (hereinafter reported as percentage of significative wavelengths, PSW).
The low variability showed by LC-MS data did not make it possible to outline any defined relation with phylogenetic markers (data not shown). We therefore decided to concentrate the analysis on the only FTIR dataset by plotting the percentages of significative correlations with each marker in each tested condition, obtaining the trends reported in Figure 3.
higher than 0.75 (Table S10), and then expressed as percentage on the total values obtained (hereinafter reported as percentage of significative wavelengths, PSW).
The low variability showed by LC-MS data did not make it possible to outline any defined relation with phylogenetic markers (data not shown). We therefore decided to concentrate the analysis on the only FTIR dataset by plotting the percentages of significative correlations with each marker in each tested condition, obtaining the trends reported in Figure 3.  Figure 3 defined three different types of progression: (i) low intensity response ( Figure 3A), characterized by monotonous trends and maximum PSW value between 2% and 4%; (ii) high intensity response ( Figure 3B), characterized by monotonous trends and a maximum PSW value between 6% and 32%; and (iii) non-monotonous response ( Figure 3C).

Data presented in
Low intensity responses were shown by SSU, TEF-1α, ITS, Beta subunit of fatty acid synthetase (FAS1) and mitochondrial COX (mtCOXII) genes ( Figure 3A). The first four markers were characterized by an absence of correlations with 0% and 8% ethanol conditions; PSW values for SSU and TEF-1α showed then an increase to 1.67% and 1.8% at 12% ethanol and reached their maximum value (2.44%) in the last condition. A similar correlation rise was shown by ITS and FAS1 markers after 12% ethanol exposure, reaching their maxima of 1.93% and 2.05%, respectively, at 16% ethanol.
The last marker with a low response is mtCOXII, characterized by a PSW value of 1.5% at 0% ethanol, followed by a decrease to zero and then a subsequent increase, similar to those of SSU and  Figure 3 defined three different types of progression: (i) low intensity response ( Figure 3A), characterized by monotonous trends and maximum PSW value between 2% and 4%; (ii) high intensity response ( Figure 3B), characterized by monotonous trends and a maximum PSW value between 6% and 32%; and (iii) non-monotonous response ( Figure 3C).

Data presented in
Low intensity responses were shown by SSU, TEF-1α, ITS, Beta subunit of fatty acid synthetase (FAS1) and mitochondrial COX (mtCOXII) genes ( Figure 3A). The first four markers were characterized by an absence of correlations with 0% and 8% ethanol conditions; PSW values for SSU and TEF-1α showed then an increase to 1.67% and 1.8% at 12% ethanol and reached their maximum value (2.44%) in the last condition. A similar correlation rise was shown by ITS and FAS1 markers after 12% ethanol exposure, reaching their maxima of 1.93% and 2.05%, respectively, at 16% ethanol.
The last marker with a low response is mtCOXII, characterized by a PSW value of 1.5% at 0% ethanol, followed by a decrease to zero and then a subsequent increase, similar to those of SSU and TEF-1α, to 1.80% at 12% ethanol. A maximum PSW value of 3.47% was finally reached in the last stressing condition.
The last type of response was that displayed by mitochondrial SSU and RPB2. Both trends showed a three-fold increase of PSW between 0% and 8% ethanol, followed by a gradual decrease to values of 2.05% and 2.18%, respectively, in correspondence of the last considered condition.
Noteworthy, ITS_LSU concatenation did not show in any case a correlation above the 0.75 threshold therefore its trend cannot be reported.
The analysis of these progressions has shown the existence of a certain correlation between some markers and the phenotypic response represented by the whole FTIR spectra. The subsequent step of the analysis focused then on the search for a more specific signal of the predictability of these markers towards the FTIR profiles.
Therefore, the correlation between the distances of the strains on the basis of the thirteen markers and the distances between the same strains calculated on the four specific spectral region related to stress response (Fatty acids W1, Amides W2, Mixed region W3 and Carbohydrates W4) considered separately was investigated. The spectra were divided into two groups: control cells spectra and ethanol stressed cells spectra, grouping together the data from the three stressing conditions (Figure 4 and Tables S11 and S12). at 0%, 8% and 12% ethanol stressing condition, reaching then their maxima of 7.70%, 7.45% and 6.29%, respectively, for 16% ethanol. ACT1 and DAL2 genes conversely showed, respectively, a PSW value of 7.19% and 5.01% in the resting condition and then slowly decreased, reaching zero at 12% ethanol; after that, PSW increased again, reaching values of 22.34% and 31.84% respectively.
The last type of response was that displayed by mitochondrial SSU and RPB2. Both trends showed a three-fold increase of PSW between 0% and 8% ethanol, followed by a gradual decrease to values of 2.05% and 2.18%, respectively, in correspondence of the last considered condition.
Noteworthy, ITS_LSU concatenation did not show in any case a correlation above the 0.75 threshold therefore its trend cannot be reported.
The analysis of these progressions has shown the existence of a certain correlation between some markers and the phenotypic response represented by the whole FTIR spectra. The subsequent step of the analysis focused then on the search for a more specific signal of the predictability of these markers towards the FTIR profiles.
Therefore, the correlation between the distances of the strains on the basis of the thirteen markers and the distances between the same strains calculated on the four specific spectral region related to stress response (Fatty acids W1, Amides W2, Mixed region W3 and Carbohydrates W4) considered separately was investigated. The spectra were divided into two groups: control cells spectra and ethanol stressed cells spectra, grouping together the data from the three stressing conditions ( Figure  4 and Tables S11 and S12).  Maximum correlation values of each marker with the four IR regions are reported and quartiles thresholds are indicated as horizontal dashed lines. ICL1, RPB1 and ACT1 showed relatively high correlation values (0.76 for the first two and 0.91 for the third) with fatty acid (W1) region of control samples, followed by RPB2, placed just below 0.70. All the other markers showed lower correlation values, always below 0.40 ( Figure 4A). The same four marker, together with DAL2, were highly correlated also with amides (W2) region of control samples.
Only RPB2 and DAL2 maintained a high correlation value with mixed (W3) and carbohydrates (W4) region, despite being the second marker slightly below 0.70 in W3 region; all the other markers showed correlation values around 0.5 or lower.
Noteworthy, the current official species markers ITS, LSU and their concatenation did not display significative correlations with the four IR regions, being always in the range between 0.25 and 0.5 for control samples.
Analyzing the correlation between the same markers and the 4 regions in the spectra of ethanol stressed cells ( Figure 4B), a quite different picture is obtained. W1 region showed a correlation higher than 0.75 only with DAL2 and RPB2, while the correlation with all the other markers was always lower.
The same two markers, together with ACT1 and mtCOXII, were highly correlated also with W2 region. Interestingly, in this region, the correlation value for all the other markers was around 0.6, with the only exception of ITS_LSU concatenation, which showed a correlation around 0.4. The same decrease in correlation with W3 and W4 region observed for control cells spectra is present also in this second situation, in which all markers showed values below 0.75.

Phenotype Analysis in Response to Short-Term Ethanol Stress
LC-MS metabolomics was employed to assess the phenotypes of S. bayanus, S. pastorianus, S. paradoxus and S. cerevisiae type strains in response to short-term stress induced by ethanol. LC-MS was carried out on cell extracts from controls (0% ethanol) and treated cells at 8%, 12% and 16% ethanol (v/v), identifying a total of 89 metabolites with known structures (Table S1). To focus on the differentially altered metabolites between strains, Hierarchical Clustering Analysis (HCA) was performed according to the VIP scores from PLS-DA ( Figure 5A-D). The most important altered pathways in the response of the four strains to ethanol (v/v) at: 0% (E); 8% (F); 12% (G); and 16% (H). y-axis, -log(p) represents metabolic pathways containing metabolites that are significantly different between the four strains; x-axis, pathway impact represents the impact of these significantly changed metabolites on the overall pathway, based on their position/number within the pathway.

Discussion
In the present study, we investigated the predictability of thirteen markers towards the complex FTIR and LC-MS metabolomic profiles of S. cerevisiae, S. paradoxus, S. bayanus and S. pastorianus reference strains. These loci were selected to check whether they could be used as predictors of specific physiological traits in addition to their taxonomic interest. The search of genes more Differentially altered metabolites were mainly amino acids and clustered separately into two main groups, defining a strain-specific pattern in each of the four conditions tested. Despite this evidence, the strains response at 8% and 12% ethanol (v/v) was very similar to that of controls, diverging significantly only at 16% ethanol. Interestingly, metabolites of the upper cluster were always upregulated in S. bayanus type strain, suggesting that the metabolome of this strain was significantly different from the others both in resting that in stressing conditions.
According to the results of significance of the perturbed metabolic pathway and the centrality of metabolites in the metabolic pathway, the perturbation of few metabolic pathways was found to be significant ( Figure 5E-H).
Interestingly, two pathways distinguished the four strains regardless of the presence of ethanol. These differences were attributable to the upregulation of several metabolites in S. bayanus CBS 380 and S. paradoxus CBS 432, impacting mainly on the pathways of Aminoacyl-tRNA biosynthesis (d-Asparagine, d-Histidine, d-Arginine, d-Glutamine, d-Serine and d-Alanine) and Arginine biosynthesis (d-Arginine, d-Glutamine, N-Acetylornithine and d-Ornithine). Conversely, the only differences in cell metabolomes at increasing concentrations of ethanol concerned the differential regulation of metabolites such as Glutathione, gamma-l-Glutamyl-l-cysteine, l-Ornithine, Spermidine and (5-l-Glutamyl)-l-amino acid, all involved in the Gluthathione metabolism pathway.

Discussion
In the present study, we investigated the predictability of thirteen markers towards the complex FTIR and LC-MS metabolomic profiles of S. cerevisiae, S. paradoxus, S. bayanus and S. pastorianus reference strains. These loci were selected to check whether they could be used as predictors of specific physiological traits in addition to their taxonomic interest. The search of genes more correlated to ethanol stress, or other physiological features, is another type of work currently undergoing in our labs, in which all genes of the genomes are employed, independently of their taxonomic interest.
FTIR fingerprint has been proposed to classify and to identify species [32,33,43] and was able to detect sub-specific variations bound to the substrate of isolation (medical vs. food) more efficiently than DNA barcoding tools [44], whereas it was less efficient for the identification of strains of closely related species [45]. This evidence could explain the fact that no significant correlation was detected between molecular markers and FTIR profiles of strains under resting condition. Conversely, eight markers out of thirteen displayed correlation values over 0.75 with LC-MS data. DNA sequences accumulate mutations which then flow into proteome, whereas cell metabolism is the result of selective pressures occurred over a long time. The evolutionary changes in DNA, and consequently in proteins, describe relatively recent events compared to changes in metabolic pathways, which can be dated back in the past. It is therefore plausible that highly conserved sequences such as the mitochondrial genes mtCOXII and mtSSU; the traditional markers ITS and LSU and their concatenation ITS_LSU; and the coding sequences TEF1α and FAS1 correlated highly with the LC-MS profiles of these four strains. These loci can therefore be proposed as "double usage" markers for taxonomy and general metabolic evolution.
The patterns of mortality under short-term ethanol stress confirmed what was previously described about the alcohol-tolerance of these species. Most strains of S. cerevisiae, S. pastorianus and S. paradoxus resulted tolerant up to 16% ethanol while S. bayanus strains showed serious difficulties withstanding 12% ethanol, and, when the percentage of ethanol increased up to 15%, the majority of strains were not able to grow [46]. The species of the Saccharomyces sensu stricto complex have all evolved into fermentation processes in highly challenging environments due to the presence of alcohol [47,48] supporting the hypothesis that functional associated traits have evolved slowly and are more phylogenetically conserved. In fact, most of the molecular markers showed increasing trends of correlation with FTIR profiles at increasing ethanol concentrations, confirming the primary role of the FTIR analytical system in finely characterizing the physiological status of cells in various conditions, including stress [18,[49][50][51][52]. Several studies have shown that the transcription of TEF-1α can be induced by increasing stressing condition [53,54]. In addition, FAS-1, one of the main actors in yeast long-chain fatty acid metabolism, was interested in stress response to ethanol [55,56], antifoaming compounds [57] and high temperature [58], whereas the mitochondrial COX gene is a known to be part of the cellular response to different types of stressing agents, such as ethanol or high temperatures [59]. It has also been demonstrated that most of these genes play an important role in the response of yeasts to stressful fermentation conditions [56,60] and oxidative stress [61,62]. In light of these considerations, the coding sequences mtCOXII, mtSSU, TEF-1α, FAS1, DAL2 and ACT1 can be considered particularly interesting as putative markers for taxonomy and ethanol stress predictability. Further research in this area without restrictions regarding taxonomic markers is already underway to shed more light on this issue.
Finally, the results from pathways analysis suggest that the common ancestor of these species has evolved along the fermentation processes at high sugar concentrations and that the selective pressure that separated these species corresponded to a sugar concentration such as to produce around 12% (v/v) alcohol. These processes selected also the metabolisms of these organisms, leaving a specific trace in the two pathways of the Aminoacyl-tRNA and arginine biosynthesis. Many studies reported that, in S. cerevisiae, the amino acid arginine exerts a protective role against ethanol stress by maintaining the integrity of cell wall and plasma membrane [63] and by triggering the TCA cycle to provide more energy to contrast the stress [64]. On the other hand, the differential regulation of metabolites involved in the Glutathione metabolism looks like a strain-specific signal, linked to the participation of Glutathione of several cell functions, by searching for free radicals (ROS) present in the cytosol [65] and by playing a key role in inducible adaptive response mechanisms able to confer to cells the resistance to oxidative stress [65][66][67].

Conclusions
In this study, we aimed at establishing a new methodology in a well-known but restricted panel of strains and genes. The results presented prove the existence of a strong link between physiology and taxonomy suggesting that many loci could be particularly interesting as "double usage" markers for taxonomy and general metabolic evolution or ethanol stress. The approach is sound and can be replicated in other models, even with more components. On the other hand, we also showed that FTIR and LC-MS are complementary techniques that should be ideally deployed together. Reconnecting taxonomy with relevant phenotypic features represents an important challenge for microbiologists, the question on the real meaning of the species concept remaining open in light of the fact that most of the biotechnologically and industrially relevant characteristics of microbial cultures are encoded by quantitative trait loci (QTL). The application of artificial intelligence will surely improve the ability of omics tools in the species identification opening new scenarios to unravel this fascinating complexity.  Table S6: Detailed results from pathway analysis. Comparison of the four strains at 0% ethanol (v/v). The statistical p values from enrichment analysis are further adjusted for multiple testing. Total is the total number of compounds in the pathway; Hits is the actually matched number from the user uploaded data; Raw p is the original p value calculated from the enrichment analysis; Holm p is the p value adjusted by Holm-Bonferroni method; FDR p is the p value adjusted using false discovery rate; Impact is the pathway impact value calculated from pathway topology analysis. Table S7: Detailed results from pathway analysis. Comparison of the four strains at 8% ethanol (v/v). The statistical p values from enrichment analysis are further adjusted for multiple testing. Total is the total number of compounds in the pathway; Hits is the actually matched number from the user uploaded data; Raw p is the original p value calculated from the enrichment analysis; Holm p is the p value adjusted by Holm-Bonferroni method; FDR p is the p value adjusted using false discovery rate; Impact is the pathway impact value calculated from pathway topology analysis. Table S8: Detailed results from pathway analysis. Comparison of the four strains at 12% ethanol (v/v). The statistical p values from enrichment analysis are further adjusted for multiple testing. Total is the total number of compounds in the pathway; Hits is the actually matched number from the user uploaded data; Raw p is the original p value calculated from the enrichment analysis; Holm p is the p value adjusted by Holm-Bonferroni method; FDR p is the p value adjusted using false discovery rate; Impact is the pathway impact value calculated from pathway topology analysis. Table S9: Detailed results from pathway analysis. Comparison of the four strains at 16% ethanol (v/v). The statistical p values from enrichment analysis are further adjusted for multiple testing. Total is the total number of compounds in the pathway; Hits is the actually matched number from the user uploaded data; Raw p is the original p value calculated from the enrichment analysis; Holm p is the p value adjusted by Holm-Bonferroni method; FDR p is the p value adjusted using false discovery rate Impact is the pathway impact value calculated from pathway topology analysis. Table S10: Percentage of significative wavelengths in the correlations between markers and IR bioassay conditions. Table S11: Correlation values of genetic markers and four IR regions in control samples; Table S12: Correlation values of genetic markers and four IR regions in ethanol stressed samples.
Funding: This research received no external funding.