Metabolomic and Transcriptomic Profiles in Diverse Brassica oleracea Crops Provide Insights into the Genetic Regulation of Glucosinolate Profiles

Glucosinolates (GSLs) are plant secondary metabolites commonly found in the cruciferous vegetables of the Brassicaceae family, offering health benefits to humans and defense against pathogens and pests to plants. In this study, we investigated 23 GSL compounds’ relative abundance in four tissues of five different Brassica oleracea morphotypes. Using the five corresponding high-quality B. oleracea genome assemblies, we identified 183 GSL-related genes and analyzed their expression with mRNA-Seq data. GSL abundance and composition varied strongly, among both tissues and morphotypes, accompanied by different gene expression patterns. Interestingly, broccoli exhibited a nonfunctional AOP2 gene due to a conserved 2OG-FeII_Oxy domain loss, explaining the unique accumulation of two health-promoting GSLs. Additionally, transposable element (TE) insertions were found to affect the gene structure of MAM3 genes. Our findings deepen the understanding of GSL variation and genetic regulation in B. oleracea morphotypes, providing valuable insights for breeding with tailored GSL profiles in these crops.

for source data.

Fig. S1
Fig. S1 The five B. oleracea morphotypes and collected tissues for GSL extraction and RNA sequencing.

Fig. S2
Fig. S2 Principal component analysis (PCA) based on GSL data showing overall variation between the three biological replicates (20 samples × 3 biological replicates).
Fig. S4 Distribution of GSL related genes in the five B. oleracea genomes.The number of GSL related genes in non-overlapped 2-Mb windows was calculated.See TableS3for source data.

Fig. S6
Fig. S6 Expression profiles for GSL related genes in four tissues of five B. oleracea morphotypes.Heatmaps were constructed using log2 transformed TPM values.Blue and red colors are used to represent low to high expression levels, respectively.Red arrows indicate nodes for the three clusters.Genes are classified based on their involvement in different processes/phases (The abbreviations: COP: Cosubstrate Pathways, CSS: Core Structure

Fig
Fig. S7 Pearson's correlation analysis between GSLs and their related genes.Gene expression profiles (TPM values) are pooled for different copies of paralogous genes in B. oleracea.(a) Distribution of Pearson's correlation coefficient.(b) Significantly (P < 0.05) correlated GSLs and genes.Genes are classified based on their involvement in different processes/phases as shown in Fig. 2. (The abbreviations: COP: Cosubstrate Pathways, CSS: Core Structure Synthesis, GSD: GSL Degradation, SCE: Side-Chain Elongation, SCM: Side-Chain Modification, TSC: Transcriptional Components, TSP: Transporters).

Fig. S8
Fig. S8 C3 (a) and C5 (b) aliphatic GSL profiles and expression levels of related genes in different tissues and morphotypes.The bar charts show relative quantity of individual GSLs in respective tissues and morphotypes.Error bars indicate standard deviation (n = 3).Heatmaps show gene expression levels.Blue and red colors are used to represent low to high expression levels, respectively.Gray color denotes that the gene is not identified in the corresponding

Fig. S9
Fig. S9 Gene expression analysis of FMO GS-OX paralogues in four tissues in five B. oleracea morphotypes.The expression level was estimated using TPM values based on mRNA-Seq data.Error bars indicate standard deviation (n = 3).

Fig. S10
Fig. S10 Gene expression analysis of AOP paralogues in four tissues in five B. oleracea morphotypes.The expression level was estimated using TPM values based on mRNA-Seq data.Error bars indicate standard deviation (n = 3).Note: BoAOP2;3.3 is BoAOP2.

Fig. S11
Fig. S11 Gene expression analysis of GSL-OH paralogues in four tissues in five B. oleracea morphotypes.The expression level was estimated using TPM values based on mRNA-Seq data.Error bars indicate standard deviation (n = 3).

Fig. S12
Fig. S12 Gene expression analysis of MAM paralogues in four tissues in five B. oleracea morphotypes.The expression level was estimated using TPM values based on mRNA-Seq data.Error bars indicate standard deviation (n = 3).

Fig. S13
Fig. S13 IGV snapshots showing mRNA-Seq alignments in BoMAM3.2genes from five B. oleracea morphotypes (Top to bottom: broccoli, cauliflower, kohlrabi, kale and white cabbage).In each snapshot, four tracks from top to bottom represent alignments in four different tissues.

Table S1 Signature
GSLs for the five B. oleracea morphotypes (Student-Newman-Keuls test with α=0.05).(Data shown in excel file) Table S2 GSL related genes identified in the five B. oleracea genomes.(Data shown in excel file)

Table S3
Position of GSL related genes identified in the five B. oleracea genomes.(Data shown in excel file)

Table S4
Summary of RNA-Seq data and statistics for read mapping.(Data shown in excel file)

Table S5
List of significantly correlated GSLs and related genes.(Datashown in excel file)TableS6The number of GSLs/Genes that are significantly correlated with the given Gene/GSL.(Data shown in excel file) Table S7 TE annotations in the long intron of MAM3 gene in three B. oleracea genomes.(Data shown in excel file) Table S8 Gene expression (TPM values) of GSL related genes in sampled tissues.(Data shown in excel file) Table S9 Relative quantities of GSLs in different B. oleracea morphotypes and tissues.(Data shown in excel file)