Comparative Study of Isoﬂavone Synthesis Genes in Two Wild Soybean Varieties Using Transcriptomic Analysis

: Soybean is an important food crop that contains high amounts of isoﬂavones. However, due to the expression of multiple genes, different soybean seeds have different isoﬂavone compositions. The underlying mechanisms for this complexity remain unknown. In this study, we identiﬁed potential differentially expressed genes (DEGs) in two wild soybean cultivars, ZYD7068 (high isoﬂavone) and ZYD7194 (low isoﬂavone), at different seed developmental stages using RNA-seq technology and compared their differences in isoﬂavone content. A total of 1067 and 6479 differentially metabolized genes were identiﬁed at R6 and R8 stages, respectively. Subsequent analysis of the KEGG pathway revealed that three of these differential metabolized genes were involved in the Isoﬂavonoid biosynthesis and Flavone and ﬂavonol biosynthesis at the R6 stage. A total of 80 TF genes encoding differential expression of MYB, bZIP, and WRKY were identiﬁed in A1 vs. B1 and A3 vs. B3. Eight differentially expressed genes were identiﬁed in duplicates at both stages, and three genes showed the same expression trend at both stages. To conﬁrm the results of RNA-seq, qRT-PCR was performed to analyze the expression of the six identiﬁed differentially expressed genes (DEGs). The results of qRT-PCR were consistent with the results of RNA-seq. We found that four genes ( Glyma.13G173300 , Glyma.13G173600 , Glyma.14G103100 , and Glyma.17G158900 ) may be involved in the positive regulation of isoﬂavone synthesis, while two genes ( Glyma.04G036700 and Glyma.19G030500 ) may be involved in the negative regulation of isoﬂavone synthesis. These ﬁndings suggest that the observed difference in isoﬂavone levels between the two cultivars may be attributable to the differential expression of these six genes at later stages of seed development.


Introduction
Soybean, as a grain-feed crop, is rich in oil and protein and contains some beneficial nutrients such as isoflavones. Isoflavones are known as phytohormones with hypotensive, menopausal, and cholesterol-lowering effects, and are bioactive substances beneficial to humans [1]. Isoflavones also play an important role in plant resilience, resisting pests and disease infestation and improving resistance to abiotic stress.
During the long domestication process, wild soybean (Glycine soja) was transformed into cultivated soybean according to people's wishes, and many genes controlling superior traits were gradually lost [2]. Glycine soja has a high genetic diversity and the ability to thrive in challenging environments and has great potential to improve traits in its cultivated parents beyond what is currently known [3][4][5][6]. Although research on soybean has focused on understanding the domestication history of soybean, little effort has been devoted

Plant Materials
High isoflavone wild soybean resource ZYD7068 and low isoflavone wild soybean resource ZYD7194 were provided by Heilongjiang Agricultural Variety Resource Conservation and Utilization Center. ZYD7068 and ZYD7194 were collected in Mudanjiang and Suihua City, Heilongjiang Province, respectively. Wild soybeans ZYD7068 and ZYD7194 were grown by the Institute of Tillage and Cultivation, Heilongjiang Academy of Agricultural Sciences, in the pot farm of Heilongjiang Academy of Agricultural Sciences, and RNA was sampled and extracted from the seeds at R5 to R8 stages [21]. During each stage, three separate RNA extractions were carried out, and the seeds of the R6 and R8 stages were used for RNA sequencing (RNA-seq), and the seeds of the R5 to R8 stages were used for quantitative real-time polymerase chain reaction (qRT-PCR) experiments.

Experimental Design
In this study, high isoflavone wild soybean resource ZDY7068 and low isoflavone wild soybean resource ZDY7194 were used as experimental materials. Their seeds at R6 and R8 stages were used as samples for RNA-seq analysis. The genes with differential expression at both R6 and R8 phases and related to isoflavone synthesis were selected as candidate genes, and the TFs related to isoflavone synthesis and up-regulated or down-regulated at R6 and R8 stages were selected. They were subjected to qRT-PCR to identify genes that might be involved in isoflavone synthesis.

Determination of Isoflavones Contents of Seeds
For isoflavone content analysis, the harvested wild soybean seeds were fully ground using a mortar and pestle, sieved through a 2.0 mm sample, weighed 0.1 g in a 250 mL triangular flask, added 90 mL of 90% methanol solution, extracted by ultrasonication at 60 • C for 30 min, centrifuged at 10,000× r/min for 10 min, and the supernatant was transferred to a 250 mL concentration flask and concentrated by rotation at 60 • C to about 40 mL. Transfer the concentrated solution to a 50 mL volumetric flask, and fix the volume with 10% methanol solution to the scale. 1 mL of the extract was passed through a 0.22 µm filter membrane in a reagent bottle and determined by high-performance liquid chromatography with the following chromatographic conditions: the column was an Agilent Poroshell 120 Hillic (equipment from the United States Agilent Technologies Ltd., Santa Clara, CA, USA) (4.6 mm × 250 mm, 5 µm); the mobile phase was eluted with a gradient of chromatographic grade 0.1% acetic acid aqueous solution and 0.1% acetic acid acetonitrile aqueous solution; the flow rate was 1.00 mL/min; the column temperature was 40 • C. The detection wavelength of the UV detector was 260 nm; the injection volume was 20 µL, and the final calibration was performed using standard samples.

Total RNA Extraction, Library Construction, and Sequencing
The RNA was extracted using the RNApure Plant Kit (Cowin Bio, Cambridge, MA, USA), and its quality was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Only samples with a RIN score of at least 7 were included in further analysis. The cDNA library was constructed by pooling moderate amounts of RNA from soybean seed samples collected during the R6 and R8 stages. This library was subsequently sequenced using the Illumina Hiseq TM platform by Sangon Biotech in Shanghai, China.

Transcriptomic Data Analysis
The quality of the sequenced raw data was evaluated by FastQC 0.11.2. Trimmomaticv 0.36 (http://www.usadellab.org/cms/index.php?page=trimmomatic accessed on 15 January 2023) software is used to cut the quality, so as to obtain relatively accurate and effective data [22]. Use HISAT2 2.1.0 to compare the valid data of the sample to the reference genome, and count the Mapping information [23], and the reference genome sequences were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/assembly/GCF_000004515.4/ accessed on 19 January 2023). The DEGseq package in R software is used for gene expression difference analysis (https://bioconductor.org/packages/release/bioc/html/DESeq2.html accessed on 19 January 2023) [24,25], and the results of expression difference analysis are visualized. To determine the variation of the RNA-seq samples, we performed principal component analysis (PCA) using the plotPCA function in the DESeq2 R package. In order to obtain genes with significant differences, we set the screening conditions as follows: p-value < 0.05 and |log 2 FoldChange| > 2.

Principal Component Analysis
Principal component analysis was performed with the vegan package in R software to visualize the distance between samples (https://cran.r-project.org/web/packages/vegan/ index.html accessed on 19 December 2022).

Functional Classification of DEGs
The topGO package in R software is used for GO enrichment analysis (https://www. bioconductor.org/packages/release/bioc/html/topGO.html accessed on 2 February 2022), and the significant GO-directed acyclic graph is drawn. The ClusterProfiler package is used for the KEGG pathway and KOG classification enrichment analysis (https://bioconductor. org/packages/release/bioc/html/clusterProfiler.html accessed on 5 February 2023).

Transcription Factor Analysis
Based on the TF sequence derived from the Peking University transcription factor database (http://planttfdb.cbi.pku.edu.cn/ accessed on 19 January 2023), we analyzed and classified DEGs using the gplots package of R software [26].

Quantitative Real-Time PCR (qRT-PCR) Verification
The qRT-PCR was performed using a miRNA qPCR Assay Kit (Cowin Bio.) kit and a QuantStudio 3 (Thermo Fisher Scientific, Waltham, MA, USA) instrument. Three biological replicates of each sample. The calculation of gene expression level followed the 2 −∆CT method described by Livak & Schmittgen [27]. GmActin11(Glyma.19G14790000) and Gm60S(Glyma.13G318800) were used as internal reference genes [28]. The specific primers are shown in Table S1. Three independent biological replicates were performed for each sample to ensure statistical credibility.

Isoflavone Content in Seeds of ZYD7068 and ZYD7194
Two wild soybean varieties from China, ZYD7068 (A) and ZYD7194 (B) were selected for transcriptome analysis due to their significantly different isoflavone content in mature seeds. The isoflavone contents of wild soybean resources ZYD7068 and ZYD7194 were determined by high-performance liquid chromatography, and the total isoflavone contents were 7149.5 µg/g and 646.4 µg/g, Daidzin contents were 988.60 µg/g and 102.85 µg/g, Glycitin contents were 453.87 µg/g and 79.79 µg/g, respectively; 79.79 µg/g, Genistin 4368.64 µg/g and 332.09 µg/g, Daidzein 855.81 µg/g and 89.69 µg/g, Glycitein 35.03 µg/g and 2.64 µg/g, and Genistein. The isoflavone content of ZYD7068 was significantly higher than that of ZYD7194 ( Figure 1). We aimed to use these two wild soybean species to investigate the mechanism of soybean isoflavone synthesis and to identify genes involved in isoflavone synthesis.

Transcriptome Profiles of ZYD7068 and ZYD7194 at R6 and R8 Stages
To reveal the molecular mechanisms regulating isoflavone synthesis during soybean seed development, soybean seeds of two varieties were collected and analyzed by transcriptional sequencing at R6 and R8 stages, and the following samples were obtained: A1, A3, B1, and B3. All sequencing data are shown in Table 1. A total of 117 and 122 million raw reads and a total of 111 and 117 million clean reads were acquired in A and B, respectively. The base percentage of Q30 in each sample was not less than 92.55%, the GC content was 46.66%~47.83%, and the sequencing quality was normal, which could be used for subsequent analysis. The transcriptomes' saturation and evenness were analyzed to determine the sequencing depth and uniformity, which revealed high sequencing quality and expression saturation for most genes ( Figure S1). The expression evenness was also found to be relatively high ( Figure S2). The samples showed a high correlation between each other ( Figure S3), and principal component analysis (PCA) showed that the seed development stage and genotype were the major contributors to data variations ( Figure S4). Overall, this study provides a comprehensive transcriptomic analysis of soybean seeds and identifies the key factors regulating isoflavone synthesis during seed development.

Identification of DEGs of ZYD7068 and ZYD7194 at Different Stages
The Volcano plots of DEGs are shown in Figure S5 (|log 2 FoldChange| > 2, p < 0.05). A systematic comparison and evaluation of differentially expressed genes (DEGs) in soybean cultivars at different developmental stages (Supplementary Data S1-S4). Comparing the different periods of cultivar A (R6 and R8 periods), 8655 DEGs were identified in A1 vs. A3, of which, 7044 genes were up-regulated and 1611 genes were down-regulated ( Figure 2A). Similarly, in variety B, the number of DEGs in B1 vs. B3 was 10,000, of which, 7816 genes were up-regulated and 2184 genes were down-regulated. The DEGs in A and B were mainly found in A3 and B3 pairs, and more genes were up-regulated than down-regulated ( Figure 2A). Representation of these comparison groups of DEGs with Venn diagrams revealed both unique and shared DEGs among pairs ( Figure 2B). Venn analysis showed that many DEGs identified were isoflavone synthesis-specific and/or genotype-specific. indicates ZYD7068 seeds at the R6 stage; A3 indicates ZYD7068 seeds at the R8 stage; B1 indicates ZYD7194 seeds at the R6 stage; and B3 indicates ZYD7194 seeds at the R8 stage.

Analysis of DEGs Clustering and Pathway Enrichment
To further understand the regulation of differentially expressed genes (DEGs) during seed development, hierarchical clustering was conducted and visualized as a boxplot dendrogram in Figure S6. Additionally, to investigate the molecular response of ZYD7068 and ZYD7194, the DEGs were functionally annotated and classified into three categories: biological process (BP), cellular component (CC), and molecular function (MF), as shown in Figure 3. A KEGG pathway analysis was performed to identify the specific pathways associated with the DEGs. The results revealed that 163 and 2218 DEGs were enriched in 152 and 259 pathways for A1 vs. B1 and A3 vs. B3, respectively, and 149 pathways were enriched in both sets of samples (Table S2, Supplementary Data S5 and S6).  At the R6 and R8 stages, the KEGG pathways of the top 30 enrichment levels are shown in Figure 4 (Supplementary Data S7 and S8). Among them, In the two groups, Ko00906 (Carotenoid biosynthesis), Ko00630 (glyoxylate and dicarboxylate metabolism), Ko04910 (insulin signaling pathway), and Ko00520 (amino sugar and nucleotide sugar metabolism) were all expressed in the top 30 enrichment levels. Among them, ko00943 (Isoflavonoid biosynthesis) and ko00944 (Flavone and flavonol biosynthesis) enriched by A1 vs. B1 were associated with isoflavone biosynthesis. Glyma.13G173300, Glyma.13G173600, and Glyma.19G030500 were enriched to the ko00943 pathway. Glyma.19G030500 was enriched to the ko00944 pathway (Table S3). . The active site of these enzymes likely contains the HXXXD motif. Additionally, trichothecene 3-O-acetyltransferase is also part of this enzyme family. Furthermore, we identified a negative regulatory role for Glyma.19G030500, and its down-regulation may lead to a decrease in soybean isoflavone content.

Different Expression Patterns of Transcription Factors in Two Genotypes at Different Stages
To further reveal the differences in isoflavones between genotypes, we searched for differentially expressed TF genes at both R6 and R8 stages (Supplementary Data S9 and S10). The number of DEGs involved in transcription factor (TFs) was 125 at the R6 stage and was 383 at the R8 stage. The differentially expressed TF genes encoding MYB, bZIP, and WRKY were found in A1 vs. B1 and A3 vs. B3 with a total of 90 ( Figure 5). We also found that the number of differentially expressed MYB, bZIP, and WRKY in A3 vs. B3 was greater than that in A1 vs. B1 ( Figure 5). These findings further indicated that a large number of TFs might participate in the process of isoflavone synthesis, which could enhance the isoflavone content of soybean seeds. Out of the differential genes identified in A vs. B, six belonged to the WRKY family, while one each belonged to the MYB and bHLH families. Cluster analysis of these eight genes revealed that Glyma.04G036700 was down-regulated in both sets of samples, and the differential genes Glyma.14G103100 and Glyma.17G158900 were up-regulated (Figure 6), indicating that all these genes may be involved in the regulation of isoflavone synthesis.

Validation of RNA-Seq Expression Levels by qRT-PCR
To validate the RNA-seq results of our study, we chose three genes that were differentially expressed and associated with isoflavone synthesis, and three transcription factors related to isoflavone regulation, for qRT-PCR analysis (Figure 7). The expression results of qRT-PCR of six genes were consistent with those of RNA-seq. Among them, the expression of Glyma.04G036700 increased significantly during seed development, while Glyma.13G173300 decreased, then increased, and finally decreased to the lowest level. In contrast, the expression trends of Glyma.13G173600, Glyma.14G103100, Glyma.17G158900, and Glyma.19G030500 were showed bell-shaped. The highest expression levels of Glyma.13G173600 and Glyma.19G030500 were at the R7 stage, while Glyma.14G103100 and Glyma.17G158900 had the highest expression levels at the R6 stage. Furthermore, the expression level of Glyma.14G10310 and Glyma.19G030500 showed a negative correlation with isoflavone content, while the other four genes showed a positive correlation. These results confirmed that our RNA-seq data were reliable.

Discussion
One of the main goals of soybean breeders is to increase the concentration of isoflavones; however, the limited genetic diversity of soybean germplasm can hinder the enhancement of these compounds [29]. In addition, glycosides and malonyl glycosides exhibit a positive correlation because they are produced by glucosyltransferase and malonyl transferase, which are key enzymes in isoflavone biosynthesis and are synthesized through a common branch in the phenylpropanoid pathway [30,31]. Differences in phenotypic and TIF contents of individual soybean varieties were observed in different growing environments and years, suggesting that both genetic and environmental factors contribute to isoflavone accumulation in soybean seeds, as shown in studies by [32,33]. To further identify key genes affecting differences between high and low isoflavone varieties, two wild soybean varieties ZYD7068 (high isoflavone) and ZYD7194 (low isoflavone) were analyzed for differential genes at the same developmental stage. A total of 1067 and 6479 differentially expressed genes were identified between the R6 and R8 stages, respectively, followed by the identification of isoflavone synthesis pathways and potential key genes using KEGG and GO analyses. Enrichment analysis of differential genes showed that they fall into three broad categories comprising biological process (BP), cellular component (CC), and molecular function (MF). KEGG analysis of differential genes showed that among the metabolic pathways significantly enriched at both R6 and R8 stages, three differential genes were classified into Isoflavonoid biosynthesis, and one of them was also classified into Flavone and flavonol biosynthesis. Similarly, we also investigated the common TFs that have been identified to be associated with isoflavone synthesis and found that three TF genes have the same expression trend at R6 and R8 stages.
Many transcription factors also affect the synthesis of soybean isoflavones. The expression levels of structural genes involved in isoflavone biosynthesis can be regulated by certain MYB transcription factors, potentially affecting isoflavone content [41,42]. Therefore, these MYB-like genes could be a potential candidate for isoflavone accumulation in soybean seeds. Several MYB transcription factors involved in the regulation of the isoflavone biosynthetic pathway have been identified in soybean. In plants, the interaction between MYB and bHLH transcription factors is also prevalent, and there is a joint regulation of both proteins in the phenylpropane metabolic pathway. In addition to the direct combination of MYB and bHLH, they will also combine with the third protein WD40 to form MYB-bHLH-WD40 (MBW). In various plant species, including A. thaliana, Camellia sinensis, Fragaria × ananassa, and Vitis vinifera, complexes of MYB-bHLH-WD40 (MBW) have been proposed as a means of finely regulating flavonoid levels [43][44][45][46][47]. The WRKY transcription factor binds to the promoter element of the key enzyme gene for isoflavone synthesis to regulate isoflavone biosynthesis. In our study, we also analyzed the expression of Glyma.04G036700, Glyma.14G103100, and Glyma.17G158900 genes, and the results showed that these three TFs were associated with isoflavone content.

Conclusions
In this study, we performed a comparative transcriptome analysis of dynamic gene expression in ZYD7068 (high isoflavone) and ZYD7194 (low isoflavone) wild soybean seeds at R6 and R8 stages. Six DEGs were screened for association with isoflavone content, three of which were associated with pathways located in the isoflavone biosynthesis pathway and flavone and flavonol biosynthesis pathway, and three were associated with TFs. Our results provide valuable insights into the molecular basis of isoflavone synthesis and accumulation in soybean seeds and may facilitate the breeding of high isoflavone soybeans.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agriculture13061164/s1, Figure S1: The saturation curves of all of the samples in A and B; Figure S2: Homogenized distribution curves of samples A and B; Figure S3: The correlation analysis of samples in A and B; Figure S4: Principal Component Analysis (PCA) of the similarities and differences between the twelve samples used for RNA-Sequencing in ZYD7068 (A) and ZYD7094 (B) at R6 and R8 stages; Figure S5: Volcano plots of DEGs. The up-regulated and downregulated DEGs were shown in red and green, respectively. The x-axis represented the fold change of DEGs in different samples. The y-axis represented the statistical significance of gene expression differences; Figure S6: Clustering analysis of the DEGs. The tree was constructed with log 2 (TMP+1). Red, black, and green indicate high, intermediate, and expression, respectively. Each sample has three repetitions labeled 1, 2, and 3; Figure S7: The isoflavone metabolism pathway in the KEGG database. (A) Isoflavonoid biosynthesis: ko00943. (B) Flavone and flavonol biosynthesis: ko00944; Table S1: Primers used for quantitative real-time PCR; Table S2: Different stages of DEGs annotated to different databases;