Datasets from harmonised metabolic phenotyping of root, tuber and banana crop

Biochemical characterisation of germplasm collections and crop wild relatives (CWRs) facilitates the assessment of biological potential and the selection of breeding lines for crop improvement. Data from the biochemical characterisation of staple root, tuber and banana (RTB) crops, i.e. banana (Musa spp.), cassava (Manihot esculenta), potato (Solanum tuberosum), sweet potato (Ipomoea batatas) and yam (Dioscorea spp.), using a metabolomics approach is presented. The data support the previously published research article “Metabolite database for root, tuber, and banana crops to facilitate modern breeding in understudied crops” (Price et al., 2020) [1]. Diversity panels for each crop, which included a variety of species, accessions, landraces and CWRs, were characterised. The biochemical profile for potato was based on five elite lines under abiotic stress. Metabolites were extracted from the tissue of foliage and storage organs (tuber, root and banana pulp) via solvent partition. Extracts were analysed via a combination of liquid chromatography – mass spectrometry (LC-MS), gas chromatography (GC)-MS, high pressure liquid chromatography with photodiode array detector (HPLC-PDA) and ultra performance liquid chromatography (UPLC)-PDA. Metabolites were identified by mass spectral matching to in-house libraries comprised from authentic standards and comparison to databases or previously published literature.

Biochemical characterisation of germplasm collections and crop wild relatives (CWRs) facilitates the assessment of biological potential and the selection of breeding lines for crop improvement. Data from the biochemical characterisation of staple root, tuber and banana (RTB) crops, i.e. banana ( Musa spp.), cassava ( Manihot esculenta ), potato ( Solanum tuberosum ), sweet potato ( Ipomoea batatas ) and yam ( Dioscorea spp .), using a metabolomics approach is presented. The data support the previously published research article "Metabolite database for root, tuber, and banana crops to facilitate modern breeding in understudied crops" (Price et al., 2020) [1] . Diversity panels for each crop, which included a variety of species, accessions, landraces and CWRs, were characterised. The biochemical profile for potato was based on five elite lines under abiotic stress. Metabolites were extracted from the tissue of foliage and storage organs (tuber, root and banana pulp) via solvent partition. Extracts were analysed via a combination of liquid chromatography -mass spectrometry (LC-MS), gas chromatography (GC)-MS, high pressure liquid chromatography with photodiode array detector (HPLC-PDA) and ultra performance liquid chromatography (UPLC)-PDA. Metabolites were identified by mass spectral matching to in-house libraries comprised from authentic standards and comparison to databases or previously published literature.

Value of the Data
• The database provides a valuable resource describing the biochemical composition of cassava, sweet potato, potato, yam and banana. • The database can be used to compare chemotypes of varieties/species of root, tuber and banana crops. • The database can facilitate the identification of agronomic and consumer traits with quantifiable biochemical markers. • Specific biochemical signatures can be identified for breeding selection.

Data Description
Resources for genetic and phenotypic diversity in underutilised crops are an important aspect for successful breeding efforts. Analysis of the metabolite composition of respective tissues/crops enables the assessment of chemical diversity available, the identification of certain phenotypes (e.g. nutrients content) or the elucidation of underlying mechanisms for specific traits (e.g. whitefly resistance in cassava [9] ). As part of the Roots, Tubers and Bananas (RTB) project, metabolomics was used to analyse diversity panels of 38 banana accessions [10] , 23 cassava varieties [11] , 25 sweet potato accessions [12] and five yam species ( D. rotundata, D. cayenensis, D. alata, D. bulbifera and D. dumetorum ) [13][14][15] . In addition, five potato varieties were analysed to identify metabolites associated with resistance to drought [16] and two cassava varieties were compared to characterise the natural variation in resistance to whitefly [9] . Anal ysis of these crops was performed on different plant tissues (e.g. leaf, tuber and root) and for banana and cassava, on plants under two different cultivation conditions: in vitro propagation and open field.
The respective species/accessions were subjected to a standard methanol-water-chloroform extraction, followed by different metabolomics techniques. LC-MS and GC-MS analysis was performed for untargeted profiling of polar and non-polar extracts. Analysis of non-polar extracts by LC-PDA was performed for a more targeted screening of isoprenoids (e.g. carotenoids and chlorophylls). Compounds in the samples were compared to retention time and UV/Vis spectrum of authentic standards ( Fig. 1 ).
Molecular features detected in the different analysis techniques, were compared to authentic standards and spectral features in databases (e.g. NIST) for metabolite identification. For GC-MS, Automated mass spectral deconvolution and identification system (AMDIS) was used and settings were modified specific to certain crops ( Table 1 ). For LC-MS analysis, the R package metaMS was used and a script for molecular feature extraction and library comparison was modified for samples analysed with maXis Ultra-High Resolution QTOF (Bruker, Germany). The outputs from both analysis techniques are available, as unprocessed Excel tables listing the areas of individual molecular features/metabolites in the respective samples, in Mendeley Data repository [2][3][4][5][6][7][8] . The identified metabolites were quantified relative to internal standards. A database was compiled for the present RTB crops and includes the quantitative range of each metabolite present in the individual tissues of each genus [1] .  Settings for Automated mass spectral deconvolution and identification system (AMDIS) for data analysis of GC-MS files.

Metabolite extraction
Lyophilised tissue was ground and homogenised to a fine powder. Aliquots (10 mg) were weighed for each sample and extracted with a methanol/water/chloroform extraction method. Due to the size of the individual sample sets, sample batch of 22 sample were created. Each sample batch included an extraction blank and a quality control, which represented a pool of a samples in the respective sample set. Extraction methods were optimised for specific chemical classes and for each crop [10][11][12][13][14][15][16][17][18] , e.g. carotenoid extraction for yam with 200 mg/sample. The yam dataset was created with GC-MS analysis of aqueous and organic phase and HPLC-DAD analysis of the organic phase. Datasets for all other crops (sweet potato, potato, banana and cassava) were created with GC-MS and LC-MS analysis of the aqueous phase and UPLC-DAD analysis of the organic phase. The dataset for sweet potato and cassava also included GC-MS analysis of the organic phase.

Liquid chromatography-mass spectrometry (LC-MS) analysis
Aqueous extracts were dried down and resuspended in methanol/water (1:1, 100 μL). Internal standard (homogentisic acid, 5 μg, or genistein, 10 μg) was added to each sample, the extraction blank and the quality control. Samples were filtered (nylon, 0.45 μm) and analysed with Dionex Ultimate 30 0 0 UHPLC (Thermo Scientific) coupled to maXis Ultra-High Resolution QTOF (Bruker, Germany) in negative electrospray ionisation mode (Vi, 5 μl). Aliquots of samples (10 μL) were separated with Acquity BEH C18 column and a solvent gradient including 0.1% formic acid in water and acetonitrile [11] . Extraction of chemical features from raw data files and search chemical database was compiled with R package metaMS [ 19 , 20 ] ( Fig. 2 ) including an in-house library with authentic standards. Identification was set to m/z difference 0.005 and retention time difference 0.3 min. The resulting data matrix containing integrated peak areas of both unidentified chemical features and annotated metabolites was exported as Microsoft Excel Open XML Spreadsheet (.xlsx) format.
Data was compiled with AMDIS (v2.71, NIST) and an in-house library specific to each crop. Deconvolution and identification settings were optimised for each crop ( Table 1 ).
The PDAs were scanning in a continuous manner from 250-600nm.
Peaks were integrated using Empower 2 (Waters, UK) and identified through chromatographic and spectral characteristics to standards ( Fig. 1 ) and literature references [22] .

Data processing
Data output from the respective data analysis software was tabulated using IdAlign (Centre for Computational Systems Biology, University of Western Australia, http://www.softsea.com/ review/IdAlign.html ) and Microsoft Excel 2016. Metabolite features present in extraction blanks were subtracted or excluded from the data sets. The identified metabolites and molecular features were quantified relative to the respective internal standard and the datasets were normalised to the individual sample weights ( μg/g dry weight). Some compounds were detected as multiple derivatives (e.g. glutamic acid and pyroglutamic acid) and their areas were merged before normalisation. In some cases, the data needed to be normalised with the quality controls to correct for batch effects (e.g. cassava [11] ).

Ethics statements
This work included plant material and did not include work involved with human subjects, animal experiments or data collected from social media platforms.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.