Identification of Food Spoilage Fungi Using MALDI-TOF MS: Spectral Database Development and Application to Species Complex

Fungi, including filamentous fungi and yeasts, are major contributors to global food losses and waste due to their ability to colonize a very large diversity of food raw materials and processed foods throughout the food chain. In addition, numerous fungal species are mycotoxin producers and can also be responsible for opportunistic infections. In recent years, MALDI-TOF MS has emerged as a valuable, rapid and reliable asset for fungal identification in order to ensure food safety and quality. In this context, this study aimed at expanding the VITEK® MS database with food-relevant fungal species and evaluate its performance, with a specific emphasis on species differentiation within species complexes. To this end, a total of 380 yeast and mold strains belonging to 51 genera and 133 species were added into the spectral database including species from five species complexes corresponding to Colletotrichum acutatum, Colletotrichum gloeosporioides, Fusarium dimerum, Mucor circinelloides complexes and Aspergillus series nigri. Database performances were evaluated by cross-validation and external validation using 78 fungal isolates with 96.55% and 90.48% correct identification, respectively. This study also showed the capacity of MALDI-TOF MS to differentiate closely related species within species complexes and further demonstrated the potential of this technique for the routine identification of fungi in an industrial context.


Introduction
The fungal kingdom is estimated to encompass between 2.2 to 3.8 million species, making it one of the widest groups on Earth [1].This large group includes diverse eukaryotic microorganisms such as yeasts and filamentous fungi [2].Those can have either positive or negative impacts on human activities.Indeed, fungi can produce a wide range of pharmaceutical products, enzymes and organic acids [2].They are also major actors in food and beverage industries due to their ability to modify and improve the organoleptic and nutritional properties of food products from animal and plant origins as well as their ability to increase food shelf-life through fermentation [3][4][5].They are involved, for instance, in the manufacturing process of soy sauce, miso, tempeh, mold-ripened cheeses, fermented sausages, bread, kombucha, beer, wine and various spirits.
Conversely, due to their ability to colonize a very large diversity of food raw materials and processed foods along the food chain, fungi are also major contributors to global food losses and waste which represent ~1.3 billion tons each year [6].As an example, Davies et al. (2021) estimated that fungi were involved in up to 20% of global crop yield losses with at least 125 million tons of the five most cultivated crops lost each year because of fungal growth [6].Moreover, Pitt and Hocking [5] estimated that fungal spoilage was responsible for 5-10% of food losses and waste.It is also worth mentioning that fungal spoilage leads to substantial financial losses [7,8] and the waste of natural resources (land, water and greenhouse gas emission) and contributes to food insecurity worldwide.
Aspergillus, Penicillium and Fusarium are the main genera involved in food spoilage.Several species of these genera are mycotoxin producers which represent a major hazard for human and animal health [9,10].Indeed, of the more than 300 mycotoxins that have been identified so far, 6 of them, namely aflatoxins, fumonisins, ochratoxins, patulin, trichothecenes and zearalenone, are regularly found in food, leading to unpredictable and ongoing food safety problems at a global scale [11].Furthermore, some species within the genera Mucor, Aspergillus and Penicillium, among others, are also responsible for opportunistic infections in immunocompromised patients [12,13].In this context, the rapid and accurate identification of fungi at the species level is crucial to ensure food safety and quality [14][15][16].
Traditionally, fungal identification was performed using phenotypic approaches, i.e., morphological and biochemical characteristics [17,18].However, those approaches are tedious, time-consuming, may be prone to misidentification and require high expertise [7,8].Over the last two decades, DNA barcoding, which relies on the sequencing of one or more standardized short DNA regions, has drastically modified the ability to identify fungal species [19].The internal transcribed spacer (ITS) region of the nuclear ribosomal DNA (rDNA) is, for instance, a highly polymorphic region that is considered the universal barcode marker for fungi [20].Hence, DNA barcoding is nowadays considered as the gold standard because of its reliability and accuracy but remains at the same time an expensive and tedious method requiring special skills and knowledge to be applicable in routine examination in an industrial setting [19,21].
In recent years, matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) has emerged as a valuable, rapid, cost-effective and reliable asset for microorganism identification [22,23].This technique was initially applied to bacterial and yeast species before expanding in recent years to filamentous fungus identification [21,[24][25][26].MALDI-TOF MS is now commonly used for routine microbial identification in clinical and industrial microbiology laboratories.It relies on the rapid and precise analysis of biomolecules such as proteins, peptides, nucleic acids and lipids yielding a specific spectral signature which can then be identified by comparison to a reference spectral library [7,25,27].Nevertheless, the lack of available spectra in commercialized databases, particularly for fungi, is still a concern [18,21,28].Spectral database implementation for food-relevant species is therefore necessary to keep this technique up to date in the face of current challenges including the addition of novel target species or fungal taxonomy updates and changes.Concerning the latter aspect, MALDI-TOF MS has been successfully applied to discriminate species within species complexes [26,29,30].A species complex is defined as a cluster of closely related species [31] which may include cryptic species [21].Species within a species complex are difficult to distinguish using traditional phenotypic methods and may require the analysis of several specific genes [32].Despite their close phylogenetic relatedness, these species may exhibit significant differences in their physiology, metabolic or ecological traits [33] and therefore could have a positive or negative impact on human activities as mentioned above.As an example, Quéro et al. [26] were able to correctly identify species of the Aspergillus section flavi using MALDI-TOF MS.Noteworthy, this section contains species of both technological and toxigenic interest [34].The Aspergillus section nigri is also particularly relevant due to its mycotoxin-producing species and their frequent occurrence in food matrices [33,35].It is worth noting that this species complex was recently re-examined by Bian et al. [33], and six species were defined within this complex, i.e., Aspergillus brasiliensis, Aspergillus eucalypticola, Aspergillus luchuensis, Aspergillus niger, Aspergillus tubingensis and Aspergillus vadensis.As of today, many species complexes remain to be studied using MALDI-TOF MS.Additionally, the rapid and accurate identification of species within species complex using an updated spectral database could facilitate distinguishing species with diverse incidences and boost the prevention and control of fungal spoilage in food.
In this context, the goal of this study was to expand the VITEK ® MS spectral database with food spoilage fungi and evaluate its performance, with a specific emphasis on species differentiation within species complexes.

Fungal Strains
In the present study, the current VITEK ® MS V3.4 Knowledge Base was expanded through the addition of 380 yeast and mold strains, which corresponded to 51 genera and 133 species, as detailed in Table 1.Spectra were also acquired on strains belonging to species already present in the previous version of the database.Out of these 133 species, 119 corresponded to new species entries, and 14 corresponded to existing species in the database for which additional spectra were acquired on new strains.Moreover, six mold species within species complexes were reworked, without spectra addition, to optimize their identification accuracy.Species selection was based on their agri-food relevance and prevalence, their ability to colonize various food types and their known mycotoxin production.Strains were obtained from several collections, i.e., American Type Culture Collection (ATCC, Manassas, CO, USA), bioMérieux strain collection (Marcy L'etoile, France), EQUASA industrial strain collection (Plouzané, France), Université de Bretagne Occidentale Culture Collection (UBOCC, Plouzané, France) and the Westerdijk Fungal Biodiversity Institute (Utrecht, The Netherlands).Morphological analysis was performed on all strains to confirm the genus or species.Furthermore, for 330 out of 380 strains, their identification was confirmed using DNA sequencing.One or more genes were sequenced (e.g., ITS region, D1/D2 domain of the 26S rRNA gene, partial ß-tubulin gene, partial elongation factor-1 alpha gene, partial actin gene, glyceraldehyde-3-phosphate dehydrogenase gene).An external validation was performed to challenge the extended database using 78 strains (Table 2).These strains were obtained from the UBOCC, EQUASA (Plouzané, France) and LUBEM (Plouzané, France) collections.All strains were identified by the DNA sequencing of one or more regions.The list of chosen strains comprised a total of 61 species, including 47 mold species and 14 yeast species.Additionally, 58 species were represented in the extended version of the database, among which 21 represented newly added species, and 3 were extended with additional spectra.The remaining three species were absent from the spectral database.

Mold Sample Preparation
After cultivation for 2 and 8 d as mentioned above, mold isolates were subjected to an extraction protocol using the VITEK ® MS mold kit (bioMérieux, Marcy l'Etoile, France).Briefly, the mycelium and/or the conidia were sampled on the agar plate surface (approximately 1 cm 2 ) using a sterile cotton swab moisturized with API Suspension Medium (bioMérieux, Marcy l'Etoile, France) [18].The sample was then immersed into a microcentrifuge tube filled with 900 µL of 70% ethanol (bioMérieux, Marcy l'Etoile, France).After vortexing for 5 s and centrifugation for 2 min at 14,000× g, the supernatant was discarded, and the pellet was resuspended into 40 µL of 70% formic acid (bioMérieux, Marcy l'Etoile, France).After vortexing for 5 s, 40 µL of acetonitrile was added and vortexed again for 5 s.Finally, a 2 min centrifugation was carried out at 14,000× g, and the supernatant was kept for spectra acquisition.

Spectra Acquisition
For spectra acquisition, two distinct protocols were applied for yeast and mold isolates.For yeast isolates, one colony was randomly collected using a loop or the VITEK ® PICKME TM (bioMérieux, Marcy l'Etoile, France) and then smeared in duplicate on a target slide (bioMérieux, Marcy l'Etoile, France).Then, 1 µL of 70% formic acid (bioMérieux, Marcy l'Etoile, France) was added directly to each spot and left to dry.For mold isolates, 1 µL of the previously obtained supernatant was transferred in duplicate on the target slide and allowed to dry.Then, for both yeasts and molds, 1 µL of α-cyano-hydroxycinnamic acid matrix solution (CHCA, bioMérieux, Marcy l'Etoile, France) was applied, and the spots were left to dry before MALDI-TOF MS analysis.
Spectra acquisition was performed using the VITEK ® MS system (bioMérieux, Marcy l'Etoile, France) equipped with the Launchpad version 2.9.5.6 acquisition software.As described by Girard et al. [28], spectra were acquired in linear positive extraction mode in a mass range from 2000 to 20,000 Da using the "Auto-Quality" option.Each spectrum was generated by the accumulation of 500 laser shots, 100 profiles being acquired from each spot with five shots per profile.Calibration was externally made using fresh cells of Escherichia coli ATCC 8739.Two quality control strains, A. brasiliensis ATCC 16404 for molds and Candida glabrata MYA-3950 for yeasts, were also included for each reagent kit and on each day of spectra acquisition.The Launchpad acquisition software automatically processed raw spectra through smoothing and peak detection procedures [28].

Spectra Quality Control Procedure
Raw spectra were individually controlled for peak resolution, the signal-to-noise ratio and absolute signal intensity.Spectra used to develop the spectral database present typically between 80 and 200 peaks.Good-quality spectra were subsequently transformed into peak lists containing m/z values and corresponding intensities [28].A single linkage agglomerative clustering algorithm was used to generate dendrograms for each species, comparative dendrograms with closely related species and dendrograms involving spectra already included in the database when needed.Dendrograms were then analyzed to detect any doubtful strains and confirm dataset consistency.The acceptance criteria were a minimum of 50% similarity and 50 peaks in common between individual spectra for a given mold species, while a minimum similarity of 65% and 50 peaks in common were used for yeast species.

Non-Supervised Analysis of Spectra from Species within Species Complexes
In the case of species complexes in the database, a non-supervised approach was employed as the first step to assess the discriminatory ability of MALDI-TOF MS.The t-Stochastic Neighbor Embedding (SNE) method was used to visualize the distance between spectra in each species complex using Plotly.jsV 2.27.0 [37,38].This non-linear projection technique enables the visualization of high-dimensional data in a lower dimension, typically a two-or three-dimensional map.The high-dimensional data are converted into a matrix of pairwise similarities followed by the application of t-SNE and visualized in a scatterplot [38].This dimensionality reduction method aims to preserve as much of the significant structure of the initial data while balancing attention between local and global aspects, thereby reducing the tendency for data points to crowd densely in the center of the map [39].

Development of Spectral Database
As previously described by Girard et al. [28], each peak from the peak list was assigned to one of the 1300 bins within the mass range of 3000 to 17,000 Da [40].Then, a log base scaling of the peak intensities was applied followed by an L1-normalization.For each species, a predictive model was established using the Advanced Spectra Classifier (ASC) algorithm developed by bioMérieux to obtain a specific weight bin matrix.To provide an identification, the new spectra were compared to the bin weight matrix, and the sum of matching bin weights was calculated and then considered as an intermediate score [28].The resulting specific scores were transformed into multiclass probability estimates using a Gaussian calibration procedure.A decision algorithm was used to retain only significant matches.When only one species was retained, the result was considered as a 'single choice'.A 'low discrimination' result was obtained when more than one species was proposed, while a 'no identification' result was obtained either when no significant matches were found or if more than four different species were retained.

Evaluation of Identification Performance by Cross-Validation
A 5-fold cross-validation was used to optimize the VITEK ® MS Knowledge Base and to assess how accurately it would perform on independent new spectra.This process was based on the partitioning of the spectral data into five complementary subsets.As described by Girard et al. [28], one round of cross-validation involved a learning phase on four subsets and the validation of the predictive model on the remaining subset.Five rounds of crossvalidation were performed by the permutation of the subsets.The estimated identification performance was obtained by combining the results of each round.A 'correct identification' was attributed when the same identification results were obtained between the crossvalidation and reference identification.A 'low discrimination' result was considered correct if the expected identification was included among the matches.A 'misidentification' was considered as a discordant identification between the cross-validation and reference identification.A 'no identification' result could also occur implying that the spectrum was considered not identified in this case.

Evaluation of Identification Performance by External Validation
The spectral database was challenged using an external dataset of 78 strains.For cultivation, a medium among those cited above was randomly chosen for each strain.Yeasts were incubated for 2 d before spectra acquisition, while mold isolates were analyzed at two randomly selected incubation times ranging from 2 to 8 d.Positive and negative controls were made using the quality control strains and reagents only, respectively.Spectra acquisition was performed in duplicate as described above.The obtained spectra were compared to the constructed spectral database to evaluate the percentage of correct identifi-cation for the species claimed in the database and the absence of identification for those not included in the database.The database performance for each species was estimated using cross-validation.Overall, 96.55% of the spectra from the VITEK MS fungal knowledge base were correctly identified to the species level, 3.1% were not identified and 0.35% were erroneously identified (discordant status).

Performance Estimation by
Among the 139 species added to the spectral database, 109 yielded an overall correct identification rate of 100% after cross-validation (Table 3).These species also yielded 100% of spectra assigned as a single choice except for six of them, namely Aspergillus amoenus, Aspergillus tabacinus, Candida variabilis, Penicillium biforme, P. funiculosum and Penicillium rubens, which yielded between 2.38% and 12.28% spectra with low discrimination for A. tabacinus and P. biforme, respectively.Among the remaining species, an overall correct identification percentage above 90% was achieved for 21 species ranging from 90.91% to 98.57% for Aspergillus jensenii and Penicillium macrosporum, respectively, while for 5 species (i.e., Aspergillus creber, A. restrictus, Cladosporium macrocarpum, Colletotrichum siamense, Hannaella luteola), the percentage of spectra correctly identified was between 80% and 90%.Finally, the spectra of four species had levels of correct identification below 80%, i.e., Aspergillus fischeri (76.92%), A. luchuensis (77.27%),Colletotrichum tropicale (73.68%) and Fusarium verticillioides (73.53%).A. fischeri and A. luchuensis had a low percentage of discordant and low discriminant spectra.For A. fischeri, 15.38% spectra were identified as a species from the same genus, i.e., A. coreanus, while 7.69% of spectra yielded low discrimination results with A. coreanus as well.Concerning A. luchuensis, 18.18% of spectra were only identified as belonging to the Aspergillus series Nigri.Furthermore, 26.32% and 25% of spectra from C. tropicale and F. verticillioides were not identified, respectively.* Species already present in the database for which additional strains were integrated.** Species from species complexes already present in the database. (1)Single choice stands for spectra identified to the correct species, low discrimination corresponds to spectra which matched with different species including the correct one and the overall correct percentage results of the addition of single choice and low discrimination percentages.
The cross-validation approach is the first method to evaluate performance and highlight possible cross-identifications.To go further in the evaluation of identification performance, an external validation was conducted with strains not included in the database.

Database Validation
The database was challenged using an external dataset.Overall, the external validation performances were the following for the species present in the database: 89.42% spectra were correctly identified, 8.65% were not identified and 1.92% were misidentified.For 62 out of 75 strains for which species were represented in the database, all acquired spectra showed expected results, i.e., a correct identification (Table 4).Species which did not yield satisfactory results were Arthrographis kalrae, A. creber, A. jensenii, Chrysosporium keratinophilum, Engyodontium album, the Fusarium solani complex, Hortaea werneckii, Mucor plumbeus, Mucor piriformis, Penicillium aurantiogriseum, P. biforme and Zygotorulaspora mrakii.Spectra from these species were either unidentified or inappropriately identified.Among those, only six species, i.e., A. kalrae, A. creber, C. keratinophilum, C. gloeosporioides, the F. solani complex, M. piriformis, P. aurantiogriseum, had less than 60% correctly identified spectra.Noteworthily, erroneously identified spectra were assigned to the correct genus.Indeed, spectra from A. creber were misidentified as Aspergillus versicolor.Concerning the three strains belonging to species that were not part of the database, all acquired spectra for two of them, i.e., Mucor brunneogriseus and Rhodotorula babjevae, yielded a "no identification" result, while spectra from Cladosporium snafimbriatum were identified as C. allicinum/C.macrocarpum (low discrimination).It is worth mentioning that C. snafimbriatum, a newly described species, is a member of the Cladosporium herbarum complex and is also closely related to C. allicinum and C. macrocarpum [41].

Performance Evaluation of MALDI-TOF MS for Species Complex Differentiation
The ability of MALDI-TOF MS for discriminating species within five species complexes, i.e., Aspergillus series Nigri, Colletotrichum acutatum complex, Mucor circinelloides complex, Colletotrichum gloeosporioides complex and Bisifusarium dimerum complex, was evaluated using non-supervised (t-SNE) and supervised approaches (cross-validation).All recognized species within these species complexes were analyzed using MALDI-TOF MS with the exception of Colletotrichum asianum and Bisifusarium tonghuanum that could not be obtained from international culture collections.The spectra from the five species complexes are displayed on t-SNE maps (Figure 1).As shown in Figure 1A and Supplementary Figure S1, some species from the Aspergillus series nigri, such as A. brasiliensis, A. tubingensis, A. luchuensis (ex 'Aspergillus coreanus') and A. vadensis, were distinguishable with well-grouped spectra according to their respective species.Spectra from A. niger and those from 'Aspergillus lacticoffeatus' and 'Aspergillus foetidus' which are now considered as synonyms of A. niger were grouped together which is consistent with Bian et al. [33].Noteworthily, the effect of cultivation time was visible for two species, i.e., 'Aspergillus piperis' (synonym of A. luchuensis) and A. eucalypticola.For 'Aspergillus piperis' (synonym of A. luchuensis), spectra obtained after 8 d were grouped on the upper quadrant, one group on the left and one on the right, while the 2-day spectra were on the lower quadrant.The same results were also observed for A. eucalypticola for which 2-day spectra were at the bottom of the lower quadrant, whereas 8-day spectra were at the top of the upper quadrant.

Figure 1.
A two-dimensional t-SNE map displaying the spectra from the Aspergillus series nigri (A), the Bisifusarium dimerum complex (B), the Colletotrichum acutatum complex (C), the Colletotrichum gloeosporioides complex (D) and the Mucor circinelloides complex (E) obtained through MALDI-TOF MS.Spectra are colored according to the respective species to which they belong.
As shown in Figure 1B and Supplementary Figure S2, spectra from the different species of the Bisifusarium dimerum complex were also quite well separated.B. allantoides, B. domesticum and B. penicillioides spectra were grouped on the lower quadrant of the t-SNE map, whereas the remaining species were grouped on the upper quadrant (Figure 1B).Spectra from B. nectrioides and B. delphinoides appeared to be more closely related on the t-SNE map (Figure 1B) which was also confirmed on the spectral similarity dendrogram (Supplementary Figure S3, similarity = 65%).As shown in Figure 1B and Supplementary Figure S2, spectra from the different species of the Bisifusarium dimerum complex were also quite well separated.B. allantoides, B. domesticum and B. penicillioides spectra were grouped on the lower quadrant of the t-SNE map, whereas the remaining species were grouped on the upper quadrant (Figure 1B).Spectra from B. nectrioides and B. delphinoides appeared to be more closely related on the t-SNE map (Figure 1B) which was also confirmed on the spectral similarity dendrogram (Supplementary Figure S3, similarity = 65%).
As shown in Figure 1C,D, species within each of the C. acutatum and C. gloeosporioides complexes demonstrated clear intra-complex separations even though they shared a relatively high-level similarity of over 60% in both cases (Supplementary Figures S3  and S4).Concerning the C. acutatum complex, spectra for all of the five species analyzed, i.e., Colletotrichum nymphaeae, Colletotrichum lupini, Colletotrichum fioriniae, Colletotrichum godetiae and C. acutatum, were well clustered and separated for each species (Figure 1C).As for the C. gloeosporioides complex, two species could be easily distinguished on the t-SNE map, i.e., Colletotrichum fructicola and C. gloeosporioides (Figure 1D).Their scatterplots were distant from each other and from all the other scatterplots.The remaining species, namely Colletotrichum musae, C. siamense and C. tropicale, were mostly grouped on the lower left quadrant.The C. musae spectra were well clustered, while the C. siamense and C. tropicale spectra were interspersed.The spectra for C. siamense were mostly present between the two clusters of C. tropicale spectra.
As shown in Figure 1E and Supplementary Figure S5, species from the M. circinelloides complex were also well separated, namely Mucor variicolumellatus, Mucor lusitanicus, Mucor ramosissimus, Mucor janssenii, Mucor ctenidius, Mucor velutinosus and Mucor griseocyanus.Two spectra from the latter species were separated from the others.They were both obtained after cultivation for 2 d on MEA (Oxoid), so it was assumed that it was linked to this specific condition.The impact of incubation time was also noticeable for Mucor bainieri, M. circinelloides and Thamnidium anomalum.For instance, M. bainieri spectra at 2 d post incubation were on the left of the right quadrant, while the spectra at 8 d post incubation were on the right of the left quadrant.The same results, but to a much lower extent, were also observed for M. circinelloides and T. anomalum.Indeed, the spectra of each species were grouped together, but part of the spectra obtained after a 2-day incubation were typically separated from those obtained after an 8-day incubation.
Spectra from the different species of the tested species complex were integrated into the bioMérieux spectral database, and identification performances were assessed by crossvalidation (Table 3).The Aspergillus series Nigri, comprising currently six species, yielded levels of correct identification ranging from 77.27% to 100% with an overall correct identification of 100% for four species, i.e., A. brasiliensis, A. eucaypticola, A. niger including isolates of 'A.foetidus' and 'A.lacticoffeatus' and A. vadensis.For the B. dimerum complex, which includes nine species, a performance of 100% correct identification was reached.Concerning the C. acutatum complex, correct identification rates ranged from 90.48% to 100% where four out of five species were found to yield 100% correct identification.Good performances were also achieved for the C. gloeosporioides complex with correct identification levels ranging from 73.68% to 100% and spectra from three out of five species yielding 100% correct identification.Finally, for the M. circinelloides complex, correct spectra identification ranged from 95% to 100%, and nine of the ten species had a 100% correct identification level.

Performance Estimation by Cross-Validation and Database Validation
In a previous study, Quéro et al. [42] complemented the VITEK ® MS database using 136 species encountered in the food and feed industry demonstrating the importance of an updated database for fungal identification.In the present study, the VITEK ® MS Knowledge Base was further reinforced with 119 new selected species and 20 species already present in the database for which improvements were made.The overall cross-validation performance was 96.55% with 3.1% unidentified and 0.35% misidentified.
Overall, 97.12% of the species examined under this study (Table 3) had a correct identification rate ranging from 80% to 100%.Additionally, 109 out of 139 species yielded an overall correct identification rate of 100%, accounting for 78.42% of the species examined.However, the correct identification rates of four species fell below 80%, i.e., A. fischeri (76.92%), A. luchuensis (77.27%), C. tropicale (73.68%) and F. verticillioides (73.53%).This lower identification performance could be linked to cross-identifications between closely related species in the database.For instance, A. luchuensis had a level of discordant spectra of 18.18%, which were identified as the "Niger complex" rather than at the species level.Aspergillus luchuensis is indeed one of the species of the Niger clade.Species within this clade share a high similarity with one another, and it is difficult to distinguish them despite the use of multigenic DNA barcoding [33,35].For F. verticillioides, we have no clear explanation for this result that was also previously observed by Quero et al. [18].The fact that 25% of spectra yielded no identification during cross-validation may be caused by the high genetic diversity of F. verticillioides at the intra-species level and/or the existence of yet-to-be-identified cryptic species [43].Therefore, the enrichment of the database with spectra of a larger diversity of strains at the population level could improve identification performance for this species.
The cross-validation results provided an estimation of the database performance.An external validation using strains not included in the database was necessary to assess the identification performance.Considering the results for all tested strains, it is promising that 90.48% spectra were correctly assigned as expected.The misidentified spectra were ascribed to either the closely related species from the same genus or from the same species complex or both.For instance, spectra from C. snafimbriatum were identified as C. allicinum and C. macrocarpum, two closely related species within the same species complex [41].To address this, the database could be expanded in a future version to encompass more species from the C. herbarum complex, including C. snafimbriatum.Noteworthily, the external and cross-validation results were consistent.A. creber, for example, had an overall identification performance of 89.19% during cross-validation with 8.11% spectra showing low discrimination with Aspergillus versicolor.The same result was also found during external validation.This issue could be addressed by adding more strains from the species of the Aspergillus series versicolores to optimize the identification accuracy of the spectral database.Noteworthily, a simplified classification of the series versicolores with a lower number of cryptic species was recently proposed by Sklenář et al. [44], leading to the definition of only four species instead of seventeen.As mentioned by Sklenář et al. [44], the use of this classification for spectral database construction may also improve identification accuracy for this series.

Performance Evaluation of MALDI-TOF MS for Species Complex Differentiation
In total, five species complexes were studied using non-supervised (t-SNE) and supervised approaches.The t-SNE method was used as an unsupervised technique to study closely related species and to assess the discriminatory ability of MALDI-TOF MS.As previously seen, this projection enabled us to differentiate species within the same complex despite a high spectral similarity which can be a struggle using only dendrograms.In fact, this non-linear projection technique enables the visualization of high-dimensional data in a lower-dimensional map and discerns specificity that were not perceptible in other arrangements [37,38].
Despite being phylogenetically and genetically close, four out of the six species of the Aspergillus series Nigri had an overall correct identification of 100% in cross-validation, and two species were above 90%, i.e., A. luchuensis (ex 'A.coreanus') (96.67%) and A. tubingensis (91.15%).A. luchuensis (ex 'Aspergillus piperis') was the only one of the sections with a correct identification level under 80%.In Bian et al. [33], the species-level identification of Aspergillus section Nigri is considered problematic, if not impossible, even using techniques such as DNA sequencing or MALDI-TOF MS.Yet, following the redefined Aspergillus series Nigri proposed by Bian et al. [33], a cross-validation performance above 90% was obtained for eight out of the nine species.This demonstrates an improvement in intra-specific differentiation within this section using MALDI-TOF MS, which could address the current difficulty of identifying these hazardous mycotoxin producers.
Secondly, Colletotrichum complexes, causative agents of anthracnosis, are responsible for food waste, resulting in an important economic impact.They mainly encompass phytopathogenic species that affect a wide variety of hosts causing considerable crop losses.For instance, two prominent complexes, C. acutatum and C. gloeosporioides, are responsible for fruit crop infections worldwide, leading to massive plant necrosis [45,46].Because of their similar characteristics, they could be difficult to differentiate.In the present study, MALDI-TOF MS proved to be a good alternative to molecular techniques to discriminate these species within complexes.The cross-validation results for the C. acutatum complex showed 90.48% to 100% of spectra being correctly identified with four out of five species having 100% correct identification.The C. gloeosporioides species performance levels varied from 73.68% to 100% with three of them achieving 100% correct identification.This identification performance is promising and could significantly enhance disease management strategies and future management outlook [45,46].The t-SNE method also showed separated scatterplots for each C. acutatum complex species.One strain of C. godetiae appeared close to the C. fioriniae scatterplots which confirmed the results obtained by cross-validation.However, the phylogenetic data did not show a taxonomic misassignment for this distant strain.The apparent vicinity of C. acutatum, C. nymphaeae and C. lupini on one part and C. fioriniae and C. godetiae on the other part is also consistent with the scientific literature [47].
Thirdly, the B. dimerum and M. circinelloides complexes were those with the best identification performances, i.e., 100% for all species and 100% correct identification level for every species except M. velutinosus (95%), respectively.Interestingly, these complexes are relevant to differentiate for different reasons.Indeed, the B. dimerum complex, which belongs to the Nectriaceae family, comprises either plant pathogens, species responsible for opportunistic infections and food spoilage but also a species (B.domesticum) voluntarily used by cheesemakers to prevent cheese organoleptic defects (stickiness defect) [48][49][50][51].Noteworthily, the B. dimerum complex which reached 100% correct identification to the species level in crossvalidation shows clear clusters in two-dimensional projection and a two-group organization.Those results are in agreement with the phylogenetic analysis conducted by Savary et al. [50].The result of the present study also confirmed the proximity of B. nectrioides and B. delphinoides.
The M. circinelloides species complex includes saprophytic species responsible for food spoilage that are also known as opportunistic pathogens responsible for mucormycosis in immunocompromised patients [52,53].Different studies have already demonstrated clear differences in virulence [54] and antifungal susceptibilities [55] among species (or forms formerly) within the complex.Given these concerning issues and the identification performance achieved in this study, MALDI-TOF MS can be a powerful asset for discriminating these species that may have varying ecologies and virulence levels.Among this complex, some species were rather well separated on the t-SNE map, whereas some species were not, and it seems to be linked to growth media and incubation time.Nevertheless, it did not impact identification performances during cross-validation.Similar results have been reported by Quéro et al. [18] for other species, i.e., A. flavus, Aureobasidium pullulans and P. expansum.These peculiar cases are important to keep in the VITEK ® MS database because it adds spectral diversity and thus allows us to build a robust database.

Conclusions
In the present study, the existing VITEK MS database was extended with food-relevant fungal species as well as species belonging to species complexes.It appeared that MALDI-TOF MS was a powerful tool to accurately identify these fungal species as well as to discriminate species within species complexes.These results emphasize the importance of continuously enhancing the database by incorporating relevant species and species complexes and taking into account the continuous evolution and progression of fungal taxonomy.

Figure 1 .
Figure 1.A two-dimensional t-SNE map displaying the spectra from the Aspergillus series nigri (A), the Bisifusarium dimerum complex (B), the Colletotrichum acutatum complex (C), the Colletotrichum gloeosporioides complex (D) and the Mucor circinelloides complex (E) obtained through MALDI-TOF MS.Spectra are colored according to the respective species to which they belong.

Table 1 .
The species and strain numbers for each species used to expand the database.

Table 2 .
Species and strain numbers used for external database validation.
* Species added to the spectral database in the present study.** Species not included in the database.

Table 3 .
The performance evaluation of the database by cross-validation.

Table 4 .
Performance evaluation of the database by external validation.