Identification of primary and secondary metabolites and transcriptome profile of soybean tissues during different stages of hypoxia

NMR and chromatography methods combined with mass spectrometry are the most important analytical techniques employed for plant metabolomics screening. Metabolomic analysis integrated to transcriptome screening add an important extra dimension to the information flow from DNA to RNA to protein. The most useful NMR experiment in metabolomics analysis is the proton spectra due the high receptivity of 1H and important structural information, through proton–proton scalar coupling. Routinely, databases have been used in identification of primary metabolites, however, there is currently no comparable data for identification of secondary metabolites, mainly, due to signal overlap in normal 1H NMR spectra and natural variation of plant. Related to spectra overlap, alternatively, better resolution can be find using 1H pure shift and 2D NMR pulse sequence in complex samples due to spreading the resonances in a second dimension. Thus, in data brief we provide a catalogue of metabolites and expression levels of genes identified in soy leaves and roots under flooding stress.


Specifications table
Subject area chemistry, biology, agronomy More specific subject area Metabolomic screening Type of data Table, figure How data was acquired NMR and RNAseq Data format Analyzed Experimental factors 1D and 2D NMR experiments were used for the metabolite annotation. The LC-DAD-MS was used to support NMR data. Statistical analysis tool such as principal component analysis and variance analysis were performed for physiological and grown parameters and metabolite relative concentration. The expression levels of genes in response to flooding stress was obtained.

Experimental features
The metabolites were assigned from chemical shift and coupling constant data and compared with literature information. The complete assignment was confirmed by 2D NMR information. The retention time and molecular mass data from LC-DAD-MS was helpful for accurate metabolite annotation. The expression of the genes in response to different hypoxia levels was assessed by analysing an RNA-seq library database derived from soybean leaves under flooding stress.

Data source location
Londrina/Brazil and Frankfurt/Germany Data accessibility Data is available with this article.

Value of data
The metabolite annotation is useful to be combined with untargeted and target metabolomic approach and might be contributed to a data bank of chemical shift and retention time of primary and secondary metabolites in soybean hydroalcoolic extracts.
Determine the main metabolic pathway is affected by flooding stress. The expression of genes related to the key enzymes involved in the sucrose degradation and alanine and GABA metabolism contribute to explain the metabolic alterations observed in under flooding.
The resume of variance analysis is important to understand the statistical analysis results.

Data
Detailed description of metabolite identification in soybean leaves and roots extract, multivariate analysis of secondary metabolites identified in soybean tissues, expression of genes and statistical analysis.

Sample preparation for metabolomic analysis
The extracts were obtained according related research article [1,2].

NMR analysis
The spectra were acquired at a temperature of 298 K on a Avance 600 spectrometer operating at 600, 1699 MHz using a 5 mm Prodigy TCI probe. The 1 H pure shift experiment was performed by reset_psyche_1d.pr NMR pulse sequence for homonuclear broadband decoupling [3,4]. The spectra were acquired with a 4.50 s presaturation delay and acquisition time of 3.64 s (64k points). The chirp pulse were generated in the shape tool of topspin with length of pulse 15 ms, total sweep-width 10 kHz, size of shape 10,000 and smoothed in 20%. The gradient pulse aligned with the centre of two chirp pulse was range to 1.0-2.0%. The spectra windows in F1 and F2 were set to 80 and 5 kHz, respectively. The number of t1 (number of chunks) was set to 32-128. The pure shift interferogram was constructed using a script processing provide by Bruker. The 1 H 1D NMR experiments were performed according to related research article [1]. Phasing and baseline correction were carried out within the instrument software.

LC-MS/MS system
LC-MS/MS system was used to support 1 H NMR data. The soybean genotypes BR4 and E45 under control conditions were analysed by LC-DAD-MS using LC-DAD-ESI system consisting of a Shimadzu 20 A HPLC equipped with a LC-20AD quaternary pump, a SPD-M20A photodiode array detector, a SIL-20A thermostated autosampler and a CTO-20A column compartment, coupled to a Bruker Ion Trap, with a heated ESI source. UV spectra were acquired from 230 to 400 nm. Mass spectra were acquired in negative and positive modes over m/z range of 100-1000, in separated runs. Operating parameters were as follow: source voltage, 4.5 kV, sheath gas, 9.00 L/min dry gas, 40 psi nebulizer and dry temperature, 300°C. Automatic MS-MS was performed on the three most abundant ions of each scan. An isolation width of m/z 3 was used and precursors were fragmented by CID with normalized collision energy of 60. The data analyses were performed using Data Analysis software. The chromatographic runs were performed using Kinetex s C-18 column (1.9 mm, 30 Â 2.1 mm i.d., Phenomenex), which was maintained at 25°C. The gradient of elution was performed with water/0.1% formic acid (A) and acetonitrile/0.1% formic acid (B) under the following conditions: 0 min, 5% B; 30 min, 40%B; 35 min, 100%B; 40 min, 100%B. Flow rate at 1.0 mL/min and injection volume of 1 mL.

Data analysis
The 1 H NMR data ranging from 6.00 to 8.50 ppm were converted to ASCII files using Bruker TopSpin 3.5. The data preprocessing and Principal Component Analysis (PCA) from 1 H NMR were performed using MATLAB R2016b and PLS-Toolbox. The data analysis was performed according related research article [1].

Gene expression analysis
RNAseq libraries of soybean roots under hypoxic stress, obtained by Nakayama et al. [5] and were used in this study. The experimental design consisted of two soybean cultivars (BR4 and E45) submitted to different stress durations: 0.5 h, 4 h, and 28 h [1].

Statistical analysis
Data from physiological parameters, biomass accumulation, and metabolomic analysis showed a normal distribution and were submitted to the analysis of variance [1].
Homonuclear scalar couplings corresponding to phenolic acids and kaempferol derivatives were collapsed in singlet lines and significantly improved resolution in aromatic region, allowing the assignment of three kaempferol isomers (28, 29, 30) due to presence of three singlet lines at 8.0 ppm corresponding to J A 0 X 0 system (Fig. 4). The phenolic compounds occur in low concentration in plants and aromatic region spectra showed the cost of sensitivity using pure shift method, but maintains advantage of obtaining simplified singlet resonances. Therefore, the better resolution of pure shift methods reveals the potential of PSYCHE 1D as deconvolution tool.
Kaempferol-3-O-α-rhamnosyl-di-β-glucoside isomer I (28) was identified as major compound, the sugar moiety shows two overlapped proton signals at 4.72 (1H, d, J ¼ 7.5) corresponding to two anomeric proton of a β-glucosyl (H-1″/1‴) and a methyl signal 1.12 (3 H, d, J ¼ 6.2 Hz) in the high-field region was assigned to rhamnose and 4.39 (1 H, d, J ¼ 1.0) were assignable to the H-1 of an α-rhamnosyl proton. In the 1 H and 13 C NMR values for all the carbons were assigned on the basis of HSQC and are given in Table 1. The placement of the sugar unit was established at C-3 position on the basis UV spectra λ max of 265-345 nm indicative that hydroxyl at C-3 is not free. In addition, the structure was further supported by of key HMBC correlations C-4 and H1″. The LC-MS/MS analysis was employed to support NMR data and the Kaempferol-3-O-α-rhaminosyl-di-β-glucoside isomer showed [Mþ H] þ peak at m/z 757 eluted at 14 min. The MS/MS spectra showed ions m/z 611, 595 and 287 in positive mode. The fragment ion m/z 611 corresponds to cleavage of glycoside bond (loss of Table 1 Chemical shifts (δ) and coupling constants (Hz) of the primary metabolites identified in hydroalcoholic extracts of soybean roots and leaves.       Daidzein (32), daidzin (33) and malonyldaidzin (34) were identified as major isoflavones from extract of soybean roots ( Figure SM2). These isoflavones are common in soybean tissues and the Table 1 shows complete assignment based on literature [8].
The fluctuation of isoflavones concentration in 1 H-NMR spectrum of Br4C genotypes was fundamental to spectral assignment due to intensity differences of H-2 and H-5 in a clearly downfield region. The aglycone Daidzein was identified as major and showed 3 signals at δ 6.83 (1 H, d, J ¼ 1.9 Hz), 6.93 (1 H, dd, J ¼ 2.  (Fig. 1). Table 2.

Expression levels of genes
The expression ratio (fold-change, fc) of genes was performed by dividing transcript abundance values (in RPM = Reads per Mapped Million) from plants under hypoxic and normoxic conditions. The statistical significance of DEGs were obtained by using Bioconductor package edgeR [6], corrected by Benjamini and Hochberg method [7]. We only considered as DEGs those showing fold-change _ 2 (up), _ -2 (down), adj. pvalue _ 0.01, and with more than 20 mapped reads (RPM _ 9) in at least one of the two compared libraries. See Table 3.

Statistical analysis
See Tables 4 and 5. Table 3 Expression levels of genes involved in sucrose degradation and alanine and GABA metabolism. Data obtained from RNAseq libraries of soybean roots under hypoxic stress. Cultivars BR4 and E45 -Times of stress (0.5 h, 4 h, and 28 h). The respective genes were analyzed comparing the hypoxia stress treatment to the control, at each time-point, and generating fold change values. Red/blue means up/down regulated respectively.