Maize specialized metabolome networks reveal organ-preferential mixed glycosides

Graphical abstract


Introduction
Maize (Zea mays L. ssp. Mays) is widely known for its importance as a food and feed crop, but it is also a predominant feedstock for the production of renewable chemicals and fuels, such as bioethanol [1][2][3][4]. In addition, maize is of substantial scientific importance; the availability of a vast collection of mutants, the high degree of genomic collinearity with other cereal crops and related grasses, and the availability of the genome sequence together with the extensive nucleotide diversity, have made maize a model system for basic and applied research [5][6][7][8].
Current research focuses mainly on deciphering the maize genome by functionally annotating the numerous unknown genes [9][10][11]. This functional genomics approach has been advanced considerably by the integration of metabolomics [12]. The metabolome, i.e., the complete set of metabolites in an organism, provides a phenotypic read-out, which can yield insight into how genes, transcripts, proteins and metabolites drive and influence the phenotype of a system. This property makes metabolomics an essential player in the understanding of cellular systems and in decoding the function of genes [13]. Nonetheless, metabolome data and knowledge of metabolite identities remain scarce for maize, especially concerning its specialized (secondary) metabolism. This information gap not only prevents understanding the system-wide biology of maize, but also the continuous development of maize as a model system.
The lack of metabolite identities also highly contrasts with the importance of specialized metabolites in various biological processes; they are involved in the attraction of pollinators, in the interaction of the plant with its environment (e.g., microorganisms in the soil), in nutrient uptake, and in plant defense against biotic (e.g., herbivores and pathogens) and abiotic (e.g., UV-radiation) stresses. Furthermore, many specialized metabolites possess biological activity [14][15][16], which is exploited in the pharmaceutical industry. Indeed, many drugs or precursors for drug synthesis are derived from plant specialized metabolites [17].
The main reason for the low number of known metabolites is the tremendous effort needed to identify the structures of new metabolites, which is either done via purification followed by nuclear magnetic resonance (NMR) analysis, or via authentication based on chemically synthesized standards and subsequent analysis by mass spectrometry (MS). In MS-based metabolite profiling, structural information on an unknown metabolite is gained via its collision-induced dissociation (CID)-spectrum, which can be matched against spectral databases [18,19]. However, spectral matching typically yields very few annotations, because these databases cover only a small fraction of all metabolites in plants [20]. This shortcoming has led to the development of various de novo structural elucidation programs [21][22][23][24][25][26][27][28]. Spectral databases are quickly emerging [29], but tend to focus on certain types of CID spectra, consequently restricting spectral matching and structural prediction to those spectral databases and de novo elucidation programs that can handle the particular CID spectral type. There are two types of CID spectra that are primarily employed in liquid chromatography (LC)-MS [30]: tandem-in-space MS/MS spectra, generated in, e.g., quadrupole-time-of-flight (QTOF) MS instruments, and tandem-in-time MS n spectra, generated in, e.g., ion trap (IT) MS instruments. Both CID spectra provide complementary information on the structure of the unknown compound, which is advantageous for spectral database matching, spectral interpretation using chemical principles, and automated, often machinelearning-based, structural elucidation. Ideally, samples have to be analyzed several times, on different instruments and using different settings to collect, for each compound, an extensive set of mass spectral data, referred to as spectral metadata hereafter. However, prior to the use of spectral metadata for structural characterization, the multiple MS/MS and MS n data for each profiled compound have to be associated. This requires the development of a spectral database in combination with an alignment tool.
In addition to the progress made in CID spectral analysis, MS spectral interpretation has also been improved by considering biotransformations. Because a limited number of organic reactions represent most of the enzymatic reactions, mass differences corresponding to those organic reactions, for example, 14.015 Da in the case of a methylation, can be searched for between pairs of features [peaks that are defined by a retention time and a mass-tocharge (m/z) value]. Biotransformations were first taken into account when interpreting direct infusion Fourier-transform (FT) MS spectra [31,32]. This method has been further developed for use with LC-MS data [33,34], and extended by including the elution order between the candidate ''substrate" and candidate ''product" features ( Fig. 1) [35]. Concatenating these candidate ''substrate-p roduct" feature pairs into a candidate substrate-product pair (CSPP) network with nodes and edges representing features and biotransformations, respectively, significantly advanced structural characterization via a propagation approach starting from known network nodes [35,36]. Associating multiple CID spectra, e.g., MS/ MS and MS n spectra, with each CSPP network node, provides complementary structural information (Fig. 1) to further boost the CSPP-based structural elucidation pipeline. In addition, the CSPPbased structural elucidation pipeline would benefit from including as many pathways and their intermediates as possible. Because different classes of specialized metabolites often accumulate in specific plant organs [37,38], CSPP network propagation would be improved by the analysis of different organs in order to maximize the variety of profiled specialized metabolites.
In this research, we focused on the phenolic metabolism of maize. Five maize organs and four genotypes were profiled using different reversed-phase LC-MS methods. All CID spectra were archived in a spectral database called DynLib, and are publicly available via an online webtool (https://bioit3.irc.ugent.be/dynlib/). The CID spectra associated with the same compound were linked using a newly developed R package, called RDynLib. This package also allows the visualization of spectral metadata and local CSPP networks for each compound. Using the various tools implemented in RDynLib, 427 compounds were structurally elucidated, of which 200 were at least partially authenticated via profiling of purchased compounds. Remarkable in this compound set was the rich variety of auxin glycosides in the tassel and corn cob, most of which had not been described before. Using the set of characterized compounds, the most frequently occurring mass differences within the maize metabolite profiles were determined and characterized. Acylations and glycosylations were among the most frequently observed biotransformations in the CSPP network, yielding a wide variety of glycosylated molecules bearing moieties corresponding to different metabolic classes.
The combination of the characterized compounds and mass differences are an important step forward in metabolic pathway discovery in maize, and the study of the specialized metabolism in general.

Growth, harvest and metabolite extraction conditions
From a maize field plot planted in May 2017 at the ILVO fields in Wetteren (Belgium), ears of the genotype CML91, H99, W153R, and OH43 were harvested at the end of June. Late cobs, leaves, stems (internodium that bears the maize cob), and tassels from these four genotypes were harvested at the end of August. The 20 samples (four genotypes and five organ types, no biological replicates were included) were separately homogenized using a GRINDOMIX GM 200 (Retch GmbH, Germany). Approximately 200 mg fresh weight was extracted with 1 mL methanol. Following evaporation of the methanol supernatant, an extraction with 0.8 mL of Milli-Q water/cyclohexane (1/1, v/v) was performed as previously described [39]. Finally, 0.2 mL of the aqueous phase was stored at À80°C.

LC-MS profiling
Each metabolic extract originating from the maize field plot (10 lL injected) was profiled in negative and positive ionization mode using two mass spectrometers, a UHPLC-ESI-QTOF-MS (Acquity UPLC system coupled to a Synapt High Definition MS, Waters Corporation, Manchester, UK) and a UHPLC-IT-FT-ICR-MS (Accela UHPLC system coupled to an LTQ FT Ultra, Thermo Scientific, Bremen, Germany). On both instrument platforms, a reversed-phase separation was performed using an Acquity UPLC BEH C18 (2.1 Â 150 mm, 1.7 lm; Waters Corporation) column heated to 40°C. The mobile phase was gradually changed from 99% solvent A (99/1/0.1 Milli-Q water/acetonitrile/formic acid, v/ v/v) to 50% solvent B (99/1/0.1 acetonitrile/Milli-Q water/formic acid, v/v/v) in 30 min using a flow of 350 lL/min. On the FT platform, full FT-ICR-MS scans between m/z 100 and m/z 1000 were recorded in parallel with data-dependent IT-MS n scans (35% collision energy) consisting of one MS 2 scan and three MS 3 scans. For each ionization mode and each sample, two runs were performed, recording MS 3 spectra of the 1st, 2nd and 3rd and of the 3rd, 4th and 5th most abundant MS 2 product ions, respectively. The ESI source voltage, capillary voltage, tube lens, capillary temperature, sheath gas, and aux gas were set at À4.5 kV, À18 V, À150 V, 275°C, 20 (arb) and 5 (arb) and 4 kV, 1 V, 40 V, 275°C, 8 (arb) and 0 (arb) in negative and positive ionization mode. On the QTOF platform, two runs per ionization mode and per sample were performed, recording full MS data between m/z 100 and m/z 1000 in the first run, and recording data-dependent analysis-based MS/ MS spectra for a maximum of three ions for prominent masses selected from a single MS survey scan in the second run. The capillary voltage, sampling cone and extraction cone were set at À2.5 kV, À37 V and À3.5 V and 2.5 kV, 40 V and 3.5 V in negative and positive ionization mode. In both ionization modes, the source and desolvation temperatures were 120 and 400°C. The cone and desolvation gas flows were set at 50 and 550 L/h and 50 and 500 L/ h in the case of negative and positive ionization mode. The trap and transfer collision energies were 4 and 3 V, and 6 and 4 V for negative and positive ionization. For data-dependent analysis, a ramping between 10 and 20 eV and between 20 and 45 eV was applied for the low and high mass ions. A CSPP is defined whenever two features have a mass difference corresponding to a biotransformation (e.g., a difference of 15.995 Da is expected in the case of an oxygenation) and an elution order that agrees with the expected change in molecular structure (e.g., the compound representing the ''product" feature is expected to elute earlier than that representing the ''substrate" feature on a reverse-phase column in the case of an oxygenation). In Morreel et al. (2014) [35], further support that the CSPP reflects a biochemical conversion had been obtained by considering the similarity between the negative ion MS 2 spectra (black) of the CSPP ''substrate" and ''product" features. In this study, both positive and negative ionization MS n and MS/MS spectra (gray), were associated with the CSPP.

Data processing, database and RDynLib construction, and structural elucidation
The processing of the LC-MS data is described in the Supplemental Text (LC-MS data processing). Also the construction of the DynLib database (DynLib database construction), the development and application of the RDynLib package (RDynLib construction and application), and the structural elucidation of the CID spectra are described in the Supplemental Text (Structural elucidation of CID spectra; Supplemental Fig. 10 and 14 -23). The maize DynLib database csv files, the perl scripts to upload CID spectra into the DynLib database, the RDynLib package, a file explaining the different functions in RDynLib ('RDynLib tools'), and two pptx files explaining how to upload data into the DynLib database via RDynLib and how to elucidate CID spectra via RDynLib, are available at https://floppy.psb.ugent.be/index.php/s/O9z6mU8IiAlWGbT. The DynLib database webtool can be consulted at https://bioit3.irc. ugent.be/dynlib/.

CSPP network construction and data analyses
The construction of Manhattan plots displaying the number of feature pairs versus the mass difference, selection of the prevailing mass differences, and construction of CSPP networks, were performed as described previously [35]. All multivariate data analyses occurred in R version 3.4.2 [40]. PCA analysis [PCA(data, graph = F) function, FactoMineR package [41]] was performed using either feature abundances or mass difference frequencies. In the case of mass differences, proportional data were obtained by dividing each mass difference by its frequency threshold computed following the approach described in Morreel (2014) [35]. Both feature abundances and mass difference frequencies were centered and unit variance-scaled. PCA results were visualized using the fviz_p-ca_ind(PCA, col.ind=''cos2 00 ) and fviz_pca_biplot(PCA, repel = TRUE, select.var = list(contrib = 10)) functions of the FactoExtra package (https://CRAN.R-project.org/package = factoextra). The Venn diagrams were generated using the venn.diagram() function in the VennDiagram package [42] in R.

Adding LC-MS data to the DynLib database
In order to characterize the maize phenolic metabolism, methanol extracts from five different organs (stem internodium, leaf, tassel, ear and late cob) and four genotypes (CML91, H99, W153R and OH43) were profiled via reversed-phase LC-MS using two instrument platforms: (i) an ultra-high-performance liquid chromatography (UHPLC) hyphenated via an electrospray ionization (ESI) source to an ion-trap Fourier-transform ion-cyclotron-resonance mass spectrometer (IT-FT-ICR-MS; hereafter abbreviated simply as FT) and (ii) an UHPLC-ESI-QTOF-MS (hereafter abbreviated simply as QTOF). Negative and positive ionization data were recorded on each platform yielding the FTneg, FTpos, QTOFneg, and QTOFpos sets of raw data. The FT was used to generate MS n spectra (in which each IT-based MS n spectrum represents an MS 2 spectrum and optionally one or more MS 3 spectra, each displaying secondorder product ions resulting from the fragmentation of a particular, MS 2 -derived, first-order product ion), whereas the QTOF was used to generate MS/MS spectra.
To obtain a general impression of the variation between the profiles, the FT data were subjected to a principal component analysis (PCA) of the feature abundances following chromatogram processing. The PCA yielded three distinct clusters based on the first and second principal components (PC1 and PC2) for the FTneg ( Fig. 2A) as well as the FTpos data sets (Supplemental Fig. 1A). The profiles from stem, ear, and late cob clustered together, whereas those of leaf and tassel were present in two distinct clusters. Together, PC1 and PC2 captured 34% (FTneg) and 23% (FTpos) of the variation between metabolite profiles, reflecting differences between the plant organs rather than between the genotypes.
Because different CID spectra provide complementary information on the structure of the unknown compounds, structural characterization would optimally benefit from including all profiling data, i.e., negative and positive ionization spectra and both MS/ MS and MS n spectra. To initiate such a strategy, all CID spectra obtained by LC-MS profiling of the different organs of the four maize genotypes were collected in an in-house-generated database (Supplemental Table 1), referred to as the 'Dynamic Library' or, simply the DynLib database (Supplemental Fig. 2; an in-depth description is given in Supplemental Text, DynLib database construction and Adding MS spectral data to the DynLib database). The DynLib database consists of four sub-databases (hereafter referred to as subDBs), i.e., for each instrument platform (FT or QTOF) and for each ionization mode, and these subDBs are called 'FTMS_neg', 'FTMS_pos', 'QTOF_neg', and 'QTOF_pos' (names differ slightly from those used for the corresponding raw data sets to stress the distinction between a raw LC-MS data set and a CID spectra-specific subDB). Following CID spectral archiving, the FTMS_neg and FTMS_pos, and the QTOF_neg and QTOF_pos subDBs contained 4,208 and 1,785 MS n spectra, and 2,665 and 1,816 MS/MS spectra, respectively. Taking into account the many LC-MS features (and their associated CID spectra) representing a particular compound, and the presence of the CID spectral information across the four different DynLib subDBs, CID spectral data of approximately 5,420 compounds (see Supplemental Text, Adding MS spectral data to the DynLib database) were uploaded into the DynLib database.
In order to exploit the complementarity of the different CID spectra, the features representing the same ions in the different subDBs of the DynLib database have to be aligned. Therefore, an R package called RDynLib was created. Because the alignment occurs between subDBs rather than chromatograms, only features for which CID spectra are available were included in the alignment. The alignment via the RDynLib package is therefore based on a combination of CID spectral matching and retention time alignment (an in-depth description is given in Supplemental Text, Aligning SubDB experiments using RDynLib; Supplemental Figs. 3 and 4). This alignment procedure was executed between the FTMS_neg and QTOF_neg, the FTMS_pos and QTOF_pos, the FTMS_neg and the FTMS_pos, and the QTOF_neg and QTOF_pos subDBs and resulted in 843, 287, 542 and 477 aligned features, respectively. Thus, between any pair of subDBs, the number of features that could be aligned ranged between 13% and 32% of the total number of features in each of the two subDBs in the considered alignment (Table 1).

Added value of including both MS/MS and MS n spectra for structural elucidation
To gain insight into the added value of the aligned MS spectral metadata (MS n and MS/MS spectra) in the DynLib database for structural elucidation, we performed spectral matching of the unique CID spectra with publicly available CID spectral databases (http://mona.fiehnlab.ucdavis.edu/downloads) (see Supplemental Text for details, Structural elucidation of CID spectra). The external CID spectral databases that were consulted comprised MassBank [18], ReSpect [43], HMDB [44], GNPS [29], iTree [45] and Metabo-BASE (https://sumnerlab.missouri.edu/download/).
Using a spectral similarity threshold of 0.6, 39 of the 4,208 (0.93%), 44 of the 1,785 (2.46%), 58 of the 2,665 (2.18%), and 43 of the 1,816 (2.37%) CID spectra in the FTMS_neg, FTMS_pos, showing the importance of including both types of CID spectra for spectral matching. However, these results also illustrate the very low number of ions from specialized metabolism that can be annotated via spectral matching with publicly available CID spectral databases, highlighting the need for structural characterization tools that take advantage of the information present in LC-MS data and different types of CID spectra.

Characterizing the maize specialized metabolome via RDynLib
Various MS spectral analysis tools are included in RDynLib to facilitate structural characterization and to exploit the spectral metadata (see Supplemental Text, Structural characterization tools in RDynLib; Supplemental Figs. 5-10). In addition to these tools, RDynLib allows mass difference analysis via the construction of local CSPP networks (based on a fixed set of 34 well-known biotransformations) (see Supplemental Text, Structural characterization tools in RDynLib; Supplemental Fig. 11). Using the MS spectral metadata, the spectral and mass-difference analysis tools in RDynLib, and knowledge about the gas-phase fragmentations for particular compound classes (see Supplemental Text, Compound class-specific gas-phase fragmentations), we structurally characterized 427 compounds from the maize-derived CID spectra present in the DynLib database (Supplemental Data Set 1). The structures of 72 compounds were verified via identity matching of the CID spectra with those of purchased standards (see Supplemental Text, Compound class-specific gas-phase fragmentations). For another 128 compounds, structural moieties were identified via identity matching of the corresponding MS 3 spectra with the MS 2 spectra of purchased standards. When consulting the PubChem database (https://pubchem.ncbi.nlm.nih.gov/), a compound database con-taining the structures of over 100 million compounds, 168 of the 427 compounds could be found, whereas a structurally highly similar isomer (Tanimoto coefficient > 0.95) was present for another 104 compounds. Based on the FooDB (http://foodb.ca/) and the CornCyc (https://www.plantcyc.org/databases/corncyc/9.0) databases, only 75 of the 427 compounds were already found in maize.
The number of characterized compounds per chemical class is shown in Table 2. The metabolic classes with the largest number of characterized compounds were the phenylpropanoids and their glycosides, the O-and C-glycosylated flavonoids and the mixed glycosides with 82, 40, 36 and 77 characterized compounds, respectively. An overview of the shikimic acid-derived metabolic pathways is shown in Fig. 3. The 'mixed glycoside' class contained saccharides to which moieties of at least two different chemical classes were attached, hence, preventing them from being included in one of the other chemical classes. For example, many phenylpropanoid and flavonoid-bearing glycosides also contain auxinderived moieties. Phenylpropanoids were found in all organs, whereas O-and C-glycosylated flavonoids were mainly present in leaf and tassel, with the O-glycosylated flavonoids being abundant in the stem as well. Other specialized metabolic classes that frequently occurred were the flavonolignans (28; enriched in the leaf, stem, and tassel), benzenoids (21; tassel), indolics (22; tassel), and benzoxazinoids (15; ear and leaf). In addition, 14 oligolignols and a number of compounds belonging to other metabolic classes were characterized. The class of oligolignols contained only aglycones; the eight characterized oligolignol glycosides were classified as (neo)lignans. Oligolignols and (neo)lignans were enriched in the stem, and in the stem and leaf, respectively. A webtool (https:// bioit3.irc.ugent.be/dynlib/) that allows searching known and unknown CID spectra of the profiled maize compounds in the Dyn-Lib database is available (Fig. 4). Each column represents a DynLib subDB. The first row represents the number of CID spectra (MS n and MS/MS spectra in case of the FTMS-based and the QTOF-based subDB, respectively) in each of the subDB. The cells in the remaining rows display the number of aligned features between the considered subDBs, indicated by the row and column header of each cell. Between parentheses, the proportion is shown of the number of aligned features versus the total number of features in the subDB mentioned in the column header. Feature alignment was not performed between the QTOF_neg and the FTMS_pos, and between the QTOF_pos and the FTMS_neg subDBs.

Mass differences of feature pairs
CSPPs are pairs of features of which the mass difference and elution order correspond to those expected for known organic reactions [35]. In order to obtain insight into the extent in which such organic reactions are enriched in specific organs, we searched for mass differences between pairs of LC-MS features that frequently occur in a given organ. To this end, all mass differences between 0 and 250 Da in intervals of 0.001 Da were computed in the FTneg and Ftpos raw data sets of the five different maize organs from the four genotypes (Fig. 2C, Supplemental Fig. 1C and 12). Following an organ-and genotype-wide PCA performed on the number of feature pairs for each of these 250,000 mass differences, the two highest-order PCs explained 56% (Fig. 2B) and 35% (Supplemental Fig. 1B) of the variation within the FTneg and FTpos raw data sets, respectively. These data indicate that specific mass differences occur more frequently in a given organ as compared to the other organs.
Most of the 250,000 mass differences do not reflect any combination between the chemical elements that occur most commonly in living organisms, i.e., C, H, O, N, P, and S elements, indicating that these mass differences do not reflect true biochemical conversions. Furthermore, in some cases for which a chemical formula can be computed from a mass difference, a chemically valid structure cannot be drawn (for example, the mass difference of 17.039 Da corresponds to CH 5 ). To ease the interpretation of the data, we enriched for true biotransformations based on the assumption that their associated mass differences should occur more frequently than mass differences that do not reflect biotransformations. Therefore, a Manhattan plot was constructed that reflects the frequency of each of the 250,000 mass differences ( Fig. 2C and Supplemental Fig. 1C). From such a Manhattan plot, prevailing mass differences, hereafter called 'candidate biotransformations', were selected [35]. Manhattan plots were constructed for each organ and for the FTneg and FTpos raw data sets (Supplemental Fig. 12). From the raw data sets of the FTneg, 107 candidate biotransformations were selected (Table 3) and for the FTpos, 167 candidate biotransformations (Supplemental Data Set 2). Upon PCA, 45% (PC1 and PC2; Fig. 2D) and 38% (PC1 and PC2; Supplemental Fig. 1D) of the variation within the selected candidate biotransformation sets of the FTneg and FTpos raw data sets were explained, respectively. Regarding the FTneg raw data set, the tassel, leaf, and stem profiles were present in three distinct clusters, whereas the ear and late cob profiles clustered together (Fig. 2D). For the FTpos raw data set, the ear and stem samples, the leaf and late cob samples, and the tassel samples formed three distinct clusters (Supplemental Fig. 1D). Similar to the PCA results obtained on feature abundances, most of the variation in biotransformation frequencies was among organs rather than among genotypes.
Mass difference frequencies, displayed in Manhattan plots, were computed for all genotypes and all organs. Mass differences are given whenever their frequencies surpassed the local frequency threshold (the frequency threshold varied dependent on the considered mass difference; see Morreel et al. (2014) [35]) in an organ Overview of Shikimic Acid-Derived Metabolic Pathways. Thin arrows represent one or multiple, either well-known or presumed, biochemical conversion(s). Thick arrows indicate one or multiple compound(s) that serve as precursor(s) without specification of a particular biochemical route. (Neo)Lignans, flavonolignans and oligolignols are given descriptive shorthand names following a previously described convention [35]. G, S, T, Sox, and SpCA refer to moieties derived from coniferyl alcohol (yielding the guaiacyl unit), sinapyl alcohol (yielding the syringyl unit), tricin, 7-oxo-sinapyl alcohol, and 9-O-p-coumaroyl sinapyl alcohol, respectively. The  of at least one of the four genotypes; in these organs, the mass difference frequency is shown in bold and underlined. For each organ and each genotype, the frequencies of all mass differences were normalized and ranked. Normalization was based on the division of the frequency of the mass difference by its local frequency threshold. These normalized mass difference frequencies were then ranked in decreasing order and an order-of-importance (OOI) number was assigned in increasing order (mass differences that show a high frequency obtained a low OOI number). The OOI number given for the considered mass difference in the

Obtaining insight into organ-preferential biotransformations via CSPP networks
In an attempt to gain insight into the organ specificity or organ enrichment of the candidate biotransformations, CSPP networks were constructed based on the selected biotransformations (Table 3 and Supplemental Data Set 2) for the FTneg (Fig. 5) and FTpos (Supplemental Fig. 13) raw data sets. In both CSPP networks, nodes represent features for which CID spectra were recorded, i.e., features that were present in the DynLib database. Based on the annotated features in the DynLib database, which were characterized via RDynLib, compound information was added to the network nodes.
The CSPP networks constructed from the features present in the FTMS_neg and FTMS_pos subDBs contained 3,505 and 652 nodes, and 32,252 and 2,552 edges, respectively. Besides the main subnetwork, the FTMS_neg-based CSPP network (Fig. 5) contained two medium-sized sub-networks, which predominantly represented N-containing compounds and the natural 13 C isotope distribution of the profiled compounds. In the network, different clusters were associated with different compound classes. When considering the node color, which represents the organ in which the feature was the most abundant, different regions in the network became apparent. For example, nodes representing oligolignols appeared as a cluster and were mainly present in the LC-MS profiles of stem extracts (Fig. 5, blue), whereas a cluster of nodes associated with medium-chain-length hydroxyfatty acids was predominantly found in late cob and leaf extracts (Fig. 5, greenish brown and  green). Similarly, a distinction between benzoxazinoids and dihydroxyindole-3-acetic acid derivatives was observed in the sub-network of the N-containing compounds, with the latter compound class being almost solely present in the tassel (Fig. 5, purple). The FTMS_pos-based CSPP network consisted of only one large subnetwork, in which clusters of features that were more abundant in particular organs could be distinguished (Supplemental Fig. 13).
The CSPP networks were subsequently consulted to characterize the included candidate biotransformations derived from the FTneg (Table 3) and FTpos raw data sets (Supplemental Data Set 2). In addition to mass differences that were associated with a single conversion, some reflected a unique combination of conversions [e.g., oxygenation + methylation + reduction (32.026 Da)], whereas others remained unassigned because they corresponded to multiple possible combinations of conversions (multiple options or MO).
The ten candidate biotransformations from which the frequencies contributed the most to the PCA-based discrimination between the different organs, based on the FTneg raw data set, are visualized in Fig. 2D. These biotransformations were associated with mass differences of 28. A PCA-based selection of the ten most important organdiscriminating candidate biotransformations (Supplemental Fig. 1D) -3-acetic acid).
In a search for the organ distribution of the candidate biotransformations, we visualized all prevalent FTneg biotransformations (see Table 3) in a Venn diagram (Fig. 2E). Twenty-eight candidate biotransformations occurred frequently in all organs. These mass differences were general biotransformations, such as reduction, oxygenation, hexosylation, sinapic acid coupling or syringyl coupling. The Venn diagram also shows biotransformations that were prevailing in only one or a few organs. Thirty-nine candidate biotransformations were enriched in only one particular organ (ranging from two in the late cob to 19 in the tassel; Fig. 2E). Among them, four were assigned to a single conversion ( Compared to the FTneg raw data set, the FTpos raw data set revealed more biotransformations that were prevailing in one particular organ. To conclude, the combination of different reversedphase LC-MS methods with CSPP-networks, allowed the structural characterization of a collection of specialized metabolites and biotransformations, which is a critical necessity in the understanding of the maize phenolic metabolism.

Structural characterization is enhanced using spectral metadata
The structural annotation of the many unknown compounds in metabolome data has been limited by the low coverage and slow growth of CID spectral libraries [20]. Consequently, only a minority of the profiled specialized metabolites can be annotated through matching their CID spectra with CID spectral libraries. This limitation was confirmed by our observation that only 0.93% up to 2.46% (dependent on the DynLib subDB) of the unique CID spectra of the maize specialized metabolome could be annotated through matching with publicly available spectral databases. The inclusion of all DynLib subDBs, and thus different types of CID spectra, for spectral matching led to more annotations than could be obtained for any of the individual subDBs. For the ''library-matched and MS/MS-MS n -aligned" CID spectra in the FTMS_neg, QTOF_neg, FTMS_pos and QTOF_pos DynLib subDBs, 20% (3/15), 19% (3/16), 25% (3/12), and 43% (3/7), respectively, were annotated via both MS/MS and MS n spectral matching. Thus, the majority of these CID spectra were annotated via either its MS/MS or its MS n spectrum. This positive increase in spectral matches by considering different types of CID spectra resulted in part from the low compound overlap between the different DynLib subDBs, which is acknowledged by the rather low number of features for which both MS/ MS and MS n spectra were recorded (Table 1). Furthermore, the added value of including all DynLib subDBs for spectral matching is the consequence of two limitations of currently existing spectral databases. Firstly, spectral databases do not fully overlap; part of each spectral database represents spectral data from compounds for which CID spectra are unavailable in other spectral databases. Secondly, these spectral databases are often focused on a particular type of CID spectrum, i.e., MS/MS or MS n spectra. Whereas the former limitation implies an augmented annotation rate when matching with several spectral databases instead of just one, the latter limitation guarantees more success in structural annotation when different types of CID spectra for a given compound are available for spectral matching. With the RDynLib package, we enabled linking of the MS/MS and MS n spectral data belonging to a given feature, thereby increasing the chance of finding a spectral match for this feature with publicly available spectral databases, now also including the DynLib database.
Nonetheless, for the majority of the profiled features there is no CID spectrum available in the currently existing spectral databases, which prevents the annotation of the unknown compound by spectral matching. However, the structure of the corresponding compounds might be present in compound databases such as the PubChem database [46]. Upon computation of the chemical formula of the unknown feature, candidate structures can be retrieved from these compound databases. Subsequently, the CID spectra of these candidate structures can be predicted in silico and matched against the unknown CID spectrum. Consequently, de novo CID spectral elucidation software, such as MetFrag [28] and CSI:FingerID [22], have been developed. De novo CID spectral elucidation software takes full advantage of the large size of these compound databases, sometimes comprising tens of millions of compounds. In this study, 168 of the 427 characterized compounds (i.e., 39.3%; Supplemental Data Set 1) were present in the PubChem database. Installing a connection between the DynLib database and de novo CID spectral elucidation software would therefore further enhance the efficiency of compound characterization.
Structural characterization of metabolites by CSPP network propagation also benefits from including as much MS data as possible. For example, the complementary information of negative and positive CID spectra allows a distinction between charge-driven and charge-remote fragmentations [47]. In addition, the spectral information gained from QTOF-MS and IT-MS instruments is complementary. Whereas QTOF-based MS/MS spectra offer a higher mass accuracy of the product ions and also display the low-mass product ions, IT-based MS n spectra allow the relationships between the product ions to be delineated. Furthermore, CID in IT-MS is less energetic than in QTOF-MS, which partially prevents further fragmentation of the initially formed first-order product ions. The longer timescale of CID reactions in IT-MS as compared to those in QTOF-MS coincides with the higher importance of thermodynamic control during CID in IT-MS as compared to kinetic control in QTOF-MS. As an example, Supplemental Fig. 15 displays the complementarity between MS/MS and MS 2 upon CID of the diferuloyl glycerol anion. To date, the RDynLib package is equipped with a battery of tools to aid in structural elucidation (see Supplemental Text, Structural characterization tools in RDynLib; Supplemental Figs. [5][6][7][8][9][10][11]. The combination of complementary CID spectra and their simultaneous analysis using the RDynLib tools, and the interplay with CSPP network propagation, played an important role in the elucidation of the 427 characterized compounds, of which the majority had not been described in maize before.

Compound abundances as well as biotransformation frequencies reflect organ-preferential metabolism
The annotated biotransformations displayed in Table 3 can be grouped into four classes. The first class represents mass differ-ences corresponding to true (bio)chemical conversions, such as hexosylation (HEX), methylation (MET), and reduction (RED). Sometimes they reflect the subsequent action of two or more enzymes such as an oxygenation followed by a methylation (MOX). The CID spectra of the CSPP substrates and products of these biotransformations often show a high spectral similarity. Some of these biotransformations or combinations of biotransformations seem to be rather specific to particular taxa. For example, the combination of carboxylation and ethylation (CAR + ETH) was not enriched in the CSPP networks derived from leaf extracts of Arabidopsis [35], nor in those derived from the extracts of different poplar organs (data not shown). The biotransformations of this class seem to be involved in decorating core structures of specialized metabolism, e.g., a flavonoid or phenylpropanoid, and were suggested by Wang et al. (2019) [48] to occur following the biosynthesis of the core structure of the specialized metabolic class. However, some of these biotransformations seem to occur more frequently for particular classes of specialized metabolites. For instance, methoxylation (MOX, 30.011 Da; see Fig. 2D) was most frequently associated with oligolignol biosynthesis and, consequently, discriminated stem organs -in which oligolignols are enriched -from the other organs. This first class of biotransformations could be referred to as 'decoration'-type biotransformations. Morreel et al. (2014) [35] argued that these CSPP biotransformations were often observed between glycosylated derivatives, whereas the corresponding enzymatic reactions are often known to occur on the aglycone level. For example, the conversion between caffeoyl hexose and feruloyl hexose is annotated as a methylation, yet, biochemically, this methylation reaction is known to use caffeoyl CoA as substrate. Therefore, a 'decoration'type biotransformation often includes multiple enzymatic reactions besides the annotated reaction.
Frequently occurring mass differences from the second class do not represent one or more true enzymatic reactions but rather the structural difference between two core structures from specialized metabolism. For example, the mass difference of 108.021 Da reflects the structural difference between a phenylpropanoid and a flavonoid, e.g., between dihydroxyindole-3acetic acid (caffeoyl) hexoside and dihydroxyindole-3-acetic acid (eriodictyol-O-) hexoside. These 'structural'-type biotransformations can only occur between two metabolites belonging to different specialized metabolic classes if their core structures share the same type and number of chemical modifications/decorations. Therefore, their occurrence supports the hypothesis that largely the same 'decoration'-type biotransformations occur in the biosynthesis pathways of different specialized metabolic classes [48]. The CID spectra of the candidate substrates and products corresponding to different metabolic classes are dissimilar. Consequently, a lower average CID spectral similarity are expected for CSPPs representing this class of conversions as compared to those belonging to the first class. Although not reflecting enzymatic reactions, these 'structural'-type biotransformations might aid deriving the biochemical pathways from the CSPP network by revealing the absence of particular pathway intermediates (Supplemental Fig. 24).
A third class of biotransformations includes true (bio)chemical conversions that involve the coupling of the core structure of a specific metabolic class onto another molecule. Among these 'core transfer'-type biotransformations are phenylacetyl coupling (PHA), syringic acid coupling (SYR), and oligolignol unit extensions such as the (non-)condensed guaiacyl (GUN58 and GUN4) and syringyl (SUN58 and SUN4) coupling reactions. The large number of glycosides and sugar esters encountered among the characterized compounds (Supplemental Data Set 1) suggests that these 'core transfer'-type biotransformations mainly happen onto a sugar moiety of a glycosylated molecule.
The fourth class of biotransformations are those that can arise via multiple, putative biotransformation pathways, each pathway often including multiple reactions. For instance, a mass difference of 3.995 Da can be explained by the mass difference between a hydrated and a methylated molecule (HYD -MET) as well as resulting from a methoxylation followed by the loss of an ethyne group (MOX -ETY). Thus, for these 'multiple options' (MO)-type biotransformations, the reactions within a particular candidate substrate and candidate product pair differ from those of another feature pair with the same mass difference. From the variety of possible reaction sequences, it might be expected that these MO-type biotransformations frequently occur in all organs, yet most of them seem to be restricted to only one or a few organs (Table 3). This suggests that a particular biotransformation pathway is prevailing for at least some of these mass differences, implying that some of these mass differences would better fit in one of the former biotransformation classes. The selection of the relevant, i.e. frequently occurring, mass differences was based on all LC-MS features, whereas structural characterization of the candidate substrates and products of the biotransformations was restricted to the LC-MS features for which CID spectra were recorded. Therefore, the correct annotation of the MO-type biotransformations might have been prevented by the sometimes low number of characterized compounds on which the characterization of the biotransformations, and thus the biotransformation classification, was based. Vice versa, biotransformations that were classified as 'decoration'-, 'structural'or 'core transfer'-type biotransformations might include feature pairs of which the mass difference results from an alternative biotransformation pathway than the one proposed in Table 3.
PCA-based organ clustering of the maize metabolome data sets was similar when using either biotransformation frequencies or feature abundances. This indicates that the separation between the different organs on the PCA plot based on biotransformation frequencies reflects both the different metabolic classes and the differentially enriched biotransformations in these organs. Consequently, many of the biotransformations that explain most of the variation between the PCA clusters represent 'core transfer'-type biotransformations (180.078 Da, GUN58 + RED; 104.026 Da, BEN; 118.041 Da, PHA; 144.041 Da, HADI; Fig. 2D; Table 3). For example, oligolignols were enriched in stems, and the clustering of this organ upon PCA could be related to the frequent occurrence of the 'GUN58 + RED' biotransformation (180.078 Da; Fig. 2D; Table 3), reflecting the oxidative coupling of a lignin monomer. 'Structural'-type biotransformations may also contribute to the PCA-based organ clustering (28.031 Da, ETH; Fig. 2D; Table 3). For example, the earlier mentioned mass difference of 108.021 Da occurs more frequently in the tassel (Table 3) than in other organs, and is the consequence of the rich variety of O-and C-glycosylated flavonoids in the tassel ( Table 2) and their structural difference with the phenylpropanoid glycosides that are present in all organs. These 'structural'-type biotransformations were generally observed between glycosylated molecules from different specialized metabolic classes. Therefore, the effect of the 'structural'-type biotransformations on the PCA clustering results in part from the large number of glycosides present in the maize metabolome data set (Supplemental Data Set 1). The high number of glycosylated molecules among the characterized compounds highlights the large number of glycosyl-and acyltransferases operating in the maize specialized metabolism and/or the broad substrate specificities of these enzymes, allowing the anchoring of specialized metabolites from different classes onto the same sugar moieties. This results in the observation of many glycosylation and acylation reactions, and the great variety of high-molecular-weight (mixed) glycosides in the CSPP network. Many of these highmolecular-weight mixed glycosides represented concatenation products between phenylpropanoid glycosides and benzoic acids, flavonoids, phenylethanoids and indolics (Supplemental Data Set 1). Acylation is established by either cytosolic CoA-dependent BAHD acyltransferases or vacuolar/apoplastic acylsugardependent serine carboxypeptidase-like (SCPL) acyltransferases [49]. Many of the SCPL enzymes have a broad substrate specificity and are implemented in transesterification reactions, e.g., the formation of disinapoyl glucose from two sinapoyl glucoses [50], leading to, among others, the high-molecular-weight (mixed) glycosides in the vacuole [51,52]. In addition, the presence of vacuolar glycoside hydrolases (GHs) that hydrolyze or rearrange glycosidic bonds suggests that these (mixed) glycosides are actively metabolized [53] and that the released sugars and aglycones can be exported to the cytosol [54]. Glycosylation and sugar ester formation are generally accepted as a strategy of the plant to detoxify, to increase the solubility, and/or to alter the biological activity of specialized metabolites [53]. However, the abundant presence of highmolecular-weight (mixed) glycosides and the metabolic malleability of the mixed glycosides suggest the importance of mixed glycoside biosynthesis as a strategy of the plant to store metabolites from different classes of primary and specialized metabolism. In addition, the coupling of multiple core structures onto one sugar unit might be essential in controlling the osmotic pressure and/ or in reducing the sugar amount that would otherwise be trapped when storing low-molecular-weight glucosides.

Phenylpropanoids are prominent in all five investigated maize organs
The PCA plot based on feature abundances indicated more differences in the maize metabolome among different organ types than among different genotypes ( Fig. 2A, Table 2). The metabolic fingerprint of the organ types was also reflected in the CSPP network. In the CSPP networks, features that are candidate substrates and products are linked via the selected candidate biotransformations. Consequently, compounds that belong to the same metabolic class typically represent sub-networks within the CSPP network. Coloring the nodes in the CSPP network, according to the organ in which the abundance of the corresponding feature was the highest, revealed a pattern closely associated with the metabolite classbased sub-networks (Fig. 5). For instance, indolics were more abundant in the tassel as compared to other organs, and oligolignols were more abundant in stem internodes. Based on the constructed CSPP networks, most specialized compound classes prevailed in a restricted set of maize organs. A notable exception were phenylpropanoids that were common in all organs (Table 2).
Among the phenylpropanoids, many p-coumarate esters/ amides, and to a lesser extent caffeate and ferulate esters/amides, were observed. These phenylpropanoids were bearing moieties derived from glycolaldehyde (in its hydrate form, i.e., ethanetriol), 2-hydroxyglutaric acid, 2-hydroxyadipic acid, isocitric acid, putrescine, hexaric acid, threonic/erythronic acid, shikimic acid, hexose, quinic acid, glycerol, tyramine, and hydroxycitric acid. Many of these acids and amines are chiral and are therefore of interest for the chemical and pharmaceutical industries. For example, hydroxycitric acid has received a lot of attention owing to its anti-obesity effect [55,56] and as a promising agent for the treatment of kidney stones [57,58]. High concentrations of hydroxycitric acid have been found in a few tropical plant species such as Hibiscus sabdariffa [59] and Garcinia species [60], and extracts from the latter are already used as food ingredients or as dietary supplements [61,62].

A variety of auxin storage forms differentially accumulate in late cob and tassel
A remarkable CSPP sub-network represented a variety of auxinrelated compounds that were almost solely present in the late cob and the tassel (Fig. 5). Most of these compounds were not only never described in maize, but were also not present in the Pubchem database (Supplemental Data Set 1). Because the presence of many of these auxin-related compounds was often associated with specific maize organs, we used this class to illustrate the potential of our findings in the construction of putative biosynthetic pathways.
The conjugation of aspartate or glutamate to indole-3-acetic acid (IAA), the core structure of the indolics, and to phenylacetic acid (PAA), the core structure of the phenylethanoids, is an important aspect of auxin homeostasis in plants [63,64]. Of these four conjugates, only IAA aspartate and PAA aspartate were found in this study. IAA aspartate was detected at higher levels in the late cob than in the tassel, whereas the aspartate amide of PAA was more abundant in the tassel. IAA and PAA are both auxins and have been proposed to be synthesized via parallel biosynthetic pathways starting from tryptophan and phenylalanine, respectively (Fig. 6) [65,66]. In these pathways, a transamination is followed by an oxidative decarboxylation [65,66]. For IAA, this pathway occurs via TAA (TRP AMINOTRANSFERASE of ARABIDOPSIS) and YUCCA family members [67]. For PAA biosynthesis, the transamination step of phenylalanine might not be necessary because phenylpyruvic acid is also a precursor of phenylalanine biosynthesis, but the YUCCA family members do play a role [68]. In line with the enrichment of the indolic and phenylethanoid metabolic classes in late cob and tassel, several members of the TAA and the YUCCA gene families are highly expressed in the cob and tassel according to the Maize eFP browser [69], making them candidate genes involved in the biosynthesis of IAA and PAA in the respective tissues. In addition to the aspartate amides of IAA and PAA, two hexose derivates of hydroxyindole-3-acetic acid were present in especially late cob and tassel: the hexose ester of 2hydroxyindole-3-acetic acid (2-hydroxyIAA) was more abundant in the tassel, whereas the hexoside of 5-hydroxyindole-3-acetic acid (5-hydroxyIAA) was prevalent in the late cob. The former compound is likely the glucose ester of 2-hydroxyIAA, and is formed via the oxidation of IAA by DAO (DIOXYGENASE FOR AUXIN OXIDATION) and a subsequent glucosylation by the uridine diphosphate glycosyltransferase UGT74D1 [63]. The tassel also showed high abundances of the hexose esters of PAA and hydroxyphenylacetic acid (HPAA). In maize, 2-hydroxyIAA is an intermediate in the biosynthesis of zeanoside C and, accordingly, the latter compound was highly abundant in the tassel.
Hydroxylation of IAA might yield the aglycone of the 5-hydroxyIAA hexoside, yet such a reaction has not yet been documented to our knowledge. An alternative pathway might start from serotonin (Fig. 6). In plants, serotonin biosynthesis involves the decarboxylation of tryptophan to tryptamine, which is then hydroxylated to serotonin. Despite being undetected in this study, serotonin is known to be synthesized in maize [70]. Furthermore, tryptamine is produced by tryptophan decarboxylase, and a member of the associated gene family has been shown to be highly expressed in the tassel and to a lesser extent in the cob [69]. In addition, a tryptamine 5-hydroxylase, encoded by CYP71A1, which converts tryptamine to serotonin, has been characterized in rice [71] and the corresponding enzyme in maize could be annotated when blasting the amino acid sequence via UniProt (https:// www.uniprot.org/blast/; 88.6% identity). In human metabolism, once formed, serotonin can be converted to 5-hydroxyindole-3acetaldehyde and, finally, to 5-hydroxyIAA by the subsequent actions of monoamine oxidase (MAO) and aldehyde dehydrogenase [72]. Some aldehyde dehydrogenase-encoding genes have been shown to be highly expressed in the tassel and to a lesser extent in cobs [69], yet their substrates are still unknown. However, to date, no MAO-encoding gene is known in plants. Consequently, although both the IAA-dependent andindependent pathways are plausible routes for the biosynthesis of 5-hydroxyIAA, gene function analysis remains necessary to pinpoint which biosynthesis path is functional in plants. Nevertheless, knowledge about the presence or absence of specific metabolites in a particular organ is a crucial first step in the construction of putative biosynthetic pathways and in the understanding of the relevance of these pathways in specific organs.
In conclusion, this study shows the benefit of combining different types of CID spectra for structural characterization both via hands-on spectral interpretation and via matching with spectral databases. Despite concluding that spectral matching yielded hits for only 1-2% of the spectra, the use of both MS/MS and MS n spectral elucidation doubled the number of hits with public databases as compared to the use of either type of CID spectrum alone. To allow this spectral metadata analysis, a tool (RDynLib) was created to align chromatograms from different instruments, to analyze the multitude of recorded CID spectra, and to combine this with the analysis of mass-difference networks, such as CSPP networks, for structural characterization. This resulted in the structural characterization of 427 of the 5,420 profiled compounds, of which most had not been described in maize before. Towards the future, the RDynLib package will further benefit from the increase in the number of CID spectra in spectral databases, as well as from the increase in the number of features for which different types of CID spectra are available. The resulting set of characterized compounds revealed the nature of prevailing mass differences among all features, representing enzymatic conversions, and also structural differences between well-known core molecules within specialized metabolism. The latter type of prevailing mass differences is at least partially due to the common nature of the various decorations, e.g., methylation or oxygenation, and the tendency of plants to concatenate specialized metabolite aglycone structures into high-molecular-weight mixed glycosides.
By utilizing different types of CID spectra recorded under different ionization modes, the DynLib database will aid the interpretation of structural features in future comparative metabolome studies in maize (and, likely, other monocots). Besides the spectra of 427 characterized compounds, all recorded unknown spectra are available via an online webtool (https://bioit3.irc.ugent.be/dynlib/ ). Using this database, the authors intend to continue their own structural elucidation efforts, but welcome proposed structures, which can be uploaded via the webtool associated with the DynLib database, via the metabolomics community.
CRediT authorship contribution statement