Transcriptome analysis identies differentially expressed genes involved in the metabolic regulatory network of progenies from the cross of low phytic acid GmMIPS1 and GmIPK1 soybean mutants

Lowering the phytic acid (PA) content of crop seeds will be benecial for improving their nutritional traits. Low phytic acid (lpa) crop lines carrying more than one independent mutated gene have been shown to exhibit more pronounced reductions of PA content than mutants with a single lpa mutated gene. But little is known about the link between PA pathway intermediates and downstream regulation following mutation of these genes in soybean. Here, we performed a comparative transcriptome analysis using an advanced-generation recombinant inbred line [2mlpa (mips1/ipk1)] with low PA and a sibling line with homozygous non-mutant alleles and normal PA [2MWT (MIPS1/IPK1)]. RNA sequencing revealed differential expression levels of numerous genes between seeds of 2mlpa and 2MWT at ve developmental stages. A total of 7,945 differentially expressed genes were identied. 3316 DEGs were in 128 metabolic and signal transduction pathways and 4980 DEGs were classied into 345 function terms associated with biological processes. Genes associated with PA metabolism, photosynthesis, starch and sucrose metabolism, and defense mechanisms were related to low PA in 2mlpa soybean line. Among these, 36 genes were up/down-regulated in PA metabolic processes, with 22 possibly contributing to the low PA phenotype of 2mlpa. Most of the genes (81 of 117) associated with photosynthesis were down-regulated in 2mlpa at the late seed stage. Three genes involved in sucrose metabolism were up-regulated at the late seed stage, which might explain the high sucrose content of 2mlpa soybeans. Additionally, 604 genes related to defense mechanisms were differentially expressed between 2mlpa and 2MWT. In this research, the soybean mutant 2mlpa was found to not only exhibit low PA but also have changes in multiple metabolites and secondary metabolites. The results delineate the regulation of these metabolic events by 2mlpa. Many genes associated with PA metabolism would contribute to the drastic reduction of PA and moderate accumulation of InsP3-InsP5 in 2mlpa mutant. And other regulated genes found in photosynthesis, starch and sucrose metabolism, and defense mechanisms would give us more insight into the nutritional and agronomic performance of 2mlpa. that some inositol polyphosphate kinase and phosphatase genes involved in PA metabolism were upregulated only at the late seed developmental stage (stage 5), e.g., two ITPK1 genes (encoding inositol-1,3,4-trisphosphate 5/6-kinase), two PI4KA genes (encoding phosphatidylinositol 4-kinase A),and ve INPP5B/F genes (encoding inositol polyphosphate 5-phosphatase). These results indicate that the mutations in 2mlpa genes was conducted in triplicate using SYBR Green Real-time PCR Master Mix (Toyobo, Osaka, Japan) on a LightCycler® 480 Real-Time PCR System (Roche Diagnostics GmbH, Relative gene expression levels were determined using the corrected relative 2 -ΔΔCT method.


Introduction
Phosphorus (P) is an essential element required for optimal plant growth 1 . In grains, approximately 75% of P is stored in the form of phytic acid (PA, myo-inositol 1,2,3,4,5,6-hexakisphosphate) 2 . The poor digestibility of PA by animals and humans causes low mineral bioavailability and phosphate pollution of soil and water. Hence, low-PA (lpa) crops are highly desirable for reducing these anti-nutritional and environmental effects 2,3 .
To narrow the negative impacts of PA, many low phytic acid mutants have been generated by mutagenesis and biotechnological methods in different crops such as soybean, rice, barley, maize and wheat [4][5][6][7][8][9][10] . The reduction level of PA was related to the various mutated gene type. Recently, it has been demonstrated that cross and selection breeding of different soybean mutants and pyramiding different gene mutations could be employed as a strategy to obtain progeny soybean seeds with stable and dramatic reduction in the intended lpa traits. Such as, drastic phytic acid reductions ranging from-79% to -88% were observed in double lpa soybean mutants with two IPK1 mutation targets on chromosomes 6 and 14 11 . More recently, the pronounced lower level of phytic acid content (-63%) was found in the progeny resulting from two rice lpa mutants data under review .
Therefore, The lpa soybean line Gm-lpa-TW-1 carries a non-lethal mutation in the MIPS1 gene, which has a 2-bp deletion in the third exon 12,13 , and the lpa soybean line Gm-lpa-ZC-2 carries a G-A point mutation in the IPK1 gene, were selected to generate a double lpa soybean mutant simultaneously carrying the two mutation targets MIPS and IPK1. MIPS (EC 5.5.1.4) is the rst and rate-limiting enzyme in the inositol and PA biosynthesis pathways, and catalyzes the conversion of glucose 6-phosphate to myo-inositol 3-phosphate 14 . IPK1 encodes an enzyme (EC 2.7.1.158) transforming InsP 5 into PA 15 . An advanced-generation recombinant inbred line, 2mlpa (mips1/ipk1) and a sibling line designated 2MWT (MIPS1/IPK1) with homozygous non-mutant alleles and normal PA were generated, and the reduction in phytic acid contents (up to 87%) observed in the double lpa lines 2mlpa were signi cantly more pronounced that those expected from the single mutants (about -50% reduction in phytate content). The 2mlpa accumulated lower inositol phosphate isomers InsP4 and InsP5, and has a lower level than the mutant which only carrying the IPK1 mutation 16 .
As we known, PA biosynthesis pathway plays a vital role in plant growth, and loss-of-function mutations in MIPS and IPK1 disrupt the PA biosynthesis pathway while dramatically reducing PA concentration. However, the mechanisms by which PA levels and downstream genes are regulated in these lpa soybean lines are poorly understood. It is therefore necessary to explore the whole regulatory network of PA metabolism. mRNA-sequencing (RNA-Seq) was used to study the effects of mutations in the MIPS1 and IPK1 genes on global changes in the gene expression pro les of developing soybean seeds. The functional enrichment of biological processes and pathways identi ed in this report further our understanding of the PA regulatory network.

Library sequencing and assembly
To obtain a more comprehensive understanding of changes at the transcriptome level in lpa soybeans, 30 cDNA libraries (three replicates) were sequenced. A total of 195.7 Gb of 150-bp paired-end clean data was obtained. Each sample yielded up to 6.3 Gb of clean data with a Q20 percentage above 98%. We mapped 85.71% of the clean reads to the annotated soybean reference Glycine_max_v2.0. The number of mapped reads ranged from 34,971,326 to 39,249,412 (Table 1). These results suggest that the data are adequate for subsequent analysis. All raw data are published in the NCBI (http://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA522338.

Sequence annotation and DEGs analysis
RPKM was used to estimate the gene expression and for DEG analysis. Each mapped soybean gene with its RPKM value in each of the 30 libraries was calculated. Principal component analysis showed that the major contributors to variation in the data were seed developmental stage and genotype (Figure 2). At the same time, the biological replicates of each soybean sample clustered together, suggesting there was little variance between replicates.
We identi ed a total of 7,945 genes with signi cantly differential expression between the 2mlpa and 2MWT lines at ve seed developmental stages ( Figure 3). Most of the DEGs were expressed during the early phase (stage 1) and late phase (stage 5) of seed development. In total, 1,380 DEGs were only differentially expressed at stage 1, while 4,730 DEGs were only differentially expressed at stage 5. There were 96 DEGs that were differentially expressed at all ve stages.

Validation of DEGs through RT-qPCR
The eight randomly selected DEGs associated with the biological processes discussed in the following sections were con rmed through RT-qPCR ( Figure 4). In most cases, gene expression trends were similar to those found in the RNA-Seq analysis, although fold changes varied between the RT-qPCR data and RNA-Seq expression data. Glyma.16G065700 ( Figure 4C), Glyma.08G109300 ( Figure 4E), Glyma.12G232500 ( Figure 4F), and Glyma.16G214900 ( Figure 4G) showed similar expression pro les in RNA-Seq and RT-qPCR analyses. However, RNA-Seq analysis revealed downregulation of Glyma.11G238800 ( Figure 4A) at stage 3, Glyma.14G072200 ( Figure 4B) at stage 5, Glyma.12G210600 ( Figure 4D) at stage 2, and Glyma.09G073600 ( Figure 4H) at stage 3 of seed development; these results were different from those obtained from the expression pro ling by RT-qPCR analysis. These results suggest that RT-qPCR expression pro les were in approximate agreement with the RNA-Seq data for 10 of the DEGs.

Functional enrichment analyses of DEGs
We assigned 4,980 DEGs between 2mlpa and 2MWT (62.67% of all 7,945 DEGs) to at least one GO term, classifying them by 345 functional terms associated with biological processes at the ve seed developmental stages (Supplementary Table   S2). Some terms were found at more than one stage while others were stage-speci c. For instance, the terms establishment of localization (23% of the total DEGs in stage 1), single-organism metabolic processes (13.39% of the total DEGs in stage 1), and localization (8.05% of the total DEGs in stage 1) were only found at early seed developmental stages (stages 1 and 2), while organic substance metabolic processes (28.95% of the total DEGs in stage 5), macromolecule metabolic processes (21.23% of the total DEGs in stage 5), and responses to abiotic stimulus (4.44% of the total DEGs in stage 5) were only found at the late seed developmental stage (stage 5).
KEGG pathway enrichment analysis is an effective method for elucidating signi cantly enriched metabolic and signal transduction pathways and the biological functions of DEGs 17 . We found 3,316 DEGs (41.74% of all 7,945 DEGs) in 128 metabolic and signal transduction pathways between the 2mlpa and 2MWT lines at ve seed developmental stages (Supplementary Table S3). The highly represented pathways, which had more DEGs, included metabolic pathways, plantpathogen interactions, plant hormone signal transduction, photosynthesis, avonoid biosynthesis, phenylpropanoid biosynthesis, and starch and sucrose metabolism ( Table 2). It has been observed that DEGs were different at each developmental stage. For example, there were no DEGs in stage 2 and 3, but 22 DESs in stage 5. We will discuss them in the following part.
According to GO and KEGG pathway functional enrichment, the enriched DEGs were mainly involved in PA metabolism, photosynthesis, starch and sucrose metabolism, and defense mechanisms. These results were partly consistent with those from other plants, such as Arabidopsis 18,19 , rice 20 , poplar 21 , and sweet potatoes 22 . Low phytate is often associated with undesirable effects on seed development and germination potential, with apoptosis commonly observed in mips1 mutants 18,19,21 . We have found that the 2mlpa plants were weaker than the 2MWT plants when grown in the eld, consistent with previous reports and related to the biological processes discussed in the following section.

DEGs involved in PA metabolism
The PA biosynthesis pathway plays a vital role in plant growth. Plants possess two parallel PA biosynthetic pathways, and both pathways start with the production of inositol 3-phosphate (InsP3) and end with PA 23 . In total, 36 genes known to be involved in PA metabolism, as well as in inositol phosphate metabolism (ko00562), displayed differences in expression between 2mlpa and 2MWT of more than twofold at the late seed developmental stage (stage 5) (Supplementary Table S4). The down-regulated DEG Glyma.11G238800, encoding MIPS1, was known to be mutated in 2mlpa ( Figure 5A), this result was consistent with the previous research that MIPS1 expression by qRT-PCR revealed a signi cant reduction in 22 DAF in mutant Gm-lpa-TW-1 with single mutated MIPS1 gene, it means that cross-breeding with mutant of Gm-lpa-ZC-2 did not alter the regulated way of MIPS1 24 . Myo-inositol phosphate synthase (MIPS), are also known to regulate pathways associated with seed and growth development 13,25,26 . Such as, in Arabidopsis (Arabidopsis thaliana), the disruption of MIPS severely reduces levels of inositol and PA, resulting in smaller plants, programmed cell death, or hypersensitivity of plants to high-intensity light stress [27][28][29] . In previous study, mutant Gm-lpa-TW-1 showed acceptable agronomic and nutritional traits except low seed eld emergence, here, seeds of 2mlpa also showed a lower eld emergence but much better than Gm-lpa-TW-1. The improvement of seeds of 2mlpa would be bene t from the cross-breeding with mutant Gmlpa-ZC-2 which had normal seeds eld emergence 12 . Interestingly, two DEGs (Glyma.05G180600 and Glyma.18G018600) also encoding myo-inositol 1-phosphate synthase were up-regulated only at the late seed developmental stage. We hypothesized that the DEGs encoding myo-inositol 1-phosphate synthase are functionally redundant and, therefore, that other MIPS1 genes (Glyma.05G180600 and Glyma.18G018600) compensated for the absence of Glyma.11G238800. The reduction of PA 30 might be due to these DEGs involved in PA metabolism.
The other mutation gene was IPK1 (Glyma.14G072200), which encodes InsP5 2-kinase and catalyzes the nal step in PA biosynthesis 20 ; this gene was down-regulated in 2mlpa only at the late seed developmental stage (stage 5) which is not consistent with the RT-PCR results in mutant Gm-lpa-ZC-2 , the relative expression levels of ZC-lpa were lower than its wild type parent Zhechun No. 3 at the early development stages but higher at later stages 15 . The expression of IPK1 gene in 2mlpa might be affected by cross-breeding. In rice, RNAi-mediated seed-speci c silencing of the IPK1 gene leads to a signi cant reduction in phytate levels and a concomitant increase in the amount of inorganic phosphate (Pi) 20 .
Constitutive suppression of such enzymes involved in phytate biosynthesis could be detrimental to plant growth and development 25,[31][32][33][34] . But in our research, we did not nd any defects induced by single IPK1 mutation in Gm-lpa-ZC-2; on the contrary, some traits of Gm-lpa-TW-1 with single MIPS1 mutation gene were improved by the cross with Gm-lpa-ZC2.
We also observed that some inositol polyphosphate kinase and phosphatase genes involved in PA metabolism were upregulated only at the late seed developmental stage (stage 5), e.g., two ITPK1 genes (encoding inositol-1,3,4-trisphosphate 5/6-kinase), two PI4KA genes (encoding phosphatidylinositol 4-kinase A),and ve INPP5B/F genes (encoding inositol polyphosphate 5-phosphatase). These results indicate that the mutations in 2mlpa mainly affect PA biosynthesis in the late seed developmental stage and induce complex chain reactions in PA metabolism. It has been reported that Gm-lpa-ZC-2 (one of the parents of 2mlpa) and 2mlpa mutants exhibits signi cantly increased contents of inositol 3-phosphate (InsP3) and inositol 4-phosphate (InsP4) compared to the wild type 8 . InsP3 and InsP4 are intermediary metabolites in PA metabolism. The up-regulation of the DEGs encoding inositol polyphosphate kinases and phosphatases is consistent with the increase of InsP3 and InsP4 in lpa lines. The impacts of single mutated gene on these PA metabolic genes' expression still need to be done in the future research.

DEGs involved in photosynthesis at the late seed developmental stage
Photosynthesis and photosynthesis-antenna protein (ko00195 and ko00196) pathways were enriched across the ve 2mlpa seed development stages, especially in the late seed developmental stage (stage 5) (Supplementary Table S5). In total, 117 DEGs were associated with photosynthesis and photosynthesis-antenna proteins. Most of the photosynthesisrelated DEGs (81 of 94 DEGs) from the late seed developmental stage (stage 5) were down-regulated in the 2mlpa mutant compared with 2MWT ( Figure 5B and Figure 5C). These DEGs encode different subunits in photosystem I, photosystem II, the cytochrome b6/f complex, the photosynthetic electron transport system, F-type-ATPase, and the light-harvesting chlorophyll protein complex. Down-regulation of the photosystem complex building subunits suggests less photosynthesis in the 2mlpa mutant lines, which has also been found in other lpa soybean mutants 35,36 . Myo-Inositol biosynthesis is rst catalyzed by MIPS (one mutant in 2mlpa), followed by dephosphorylation of L-myo-inositol 1-phosphate to myo-inositol. Myo-Inositol protects the photosynthesis process in Mesembryanthemum crystallinum, which is consistent with the change in myo-inositol and down-regulation of photosynthesis-related DEGs in 2mlpa 8,36 .
The photosynthesis-antenna proteins pathway modulates plant defense responses induced by abiotic signals such as light and temperature 20,37,38 . The light-harvesting chlorophyll protein complex (LHC) belongs to the photosynthesis-antenna proteins pathway. The chief function of the LHC is to collect and transfer light energy to photosynthetic reaction centers 38 .
In this study, DEGs associated with the LHC were signi cantly enriched in late soybean seed development (stage 5). In total, 16 DEGs encoding chlorophyll a-b binding proteins related to 9 of 11 LHC subunits were down-regulated only in late seed development (stage 5) in the 2mlpa line ( Figure 5B). Such diminished expression may reduce light harvesting and the synthesis of plant energy. These results suggest that there is a special relationship between the mutations in 2mlpa and the photosynthesis-antenna proteins pathway during late seed development. Furthermore, light provides energy for photosynthesis, and it also functions as a signal to direct plant growth and development. Arabidopsis mips1 mutants (mutation in MIPS1 gene of Arabidopsis thaliana) display light intensity-dependent cell death 28,39 . We observed that most of the photosynthesis-related DEGs were down-regulated in 2mlpa in the stage 5, consistent with previous reports. These results suggest that a mutation in MIPS1 or low PA may decrease photosynthesis during late seed development. These down-regulated genes involved in photosynthesis might contribute to some unfavorable traits in lpa mutants, such as lower seed eld emergence.

DEGs involved in starch and sucrose metabolism
In higher plants, sucrose is the major transport form of carbohydrates to meet the carbon and energy needed for growth and the synthesis of storage reserves 40 . Furthermore, sucrose is a signaling molecule affecting plant development processes such as plant growth, plant defense, and the regulation of owering and the development of storage organs 3,30,40 . The lpa soybean 2mlpa exhibits a high-sucrose phenotype. In total, 67 starch and sucrose metabolism (ko00500)related DEGs were almost exclusively up-regulated in 2mlpa compared with 2MWT at the late stage of seed development ( Figure 5D, Supplementary Table S6). These DEGs mainly include genes encoding key enzymes involved in starch biosynthesis and degradation, sucrose metabolism, inter-conversion between sugar and starch, and transporters. The lpa soybean 2mlpa showed a series of chain reactions in the starch and sucrose metabolism pathways. Myo-Inositol is an important intermediate metabolite in PA biosynthesis and sucrose metabolism. The biosynthesis of ra nosaccharide proceeds by the reversible addition of galactose units from galactinol to sucrose with the help of myo-inositol 41 . Variations in myo-inositol level might therefore reduce the biosynthesis of ra nosaccharide, increasing sucrose levels 30,41 . Mutation of the MIPS1 gene results in a reduction in PA level, a series of changes in intermediate metabolites of PA biosynthesis (such as myo-inositol), and concomitantly a decrease in ra nosaccharide level, which might result in an increase in sucrose level in Gm-lpa-TW-1 30 . The mutation of the MIPS1 gene in 2mlpa might lead to the higher sucrose level observed, and the upregulation of 67 starch and sucrose metabolism (ko00500)-related DEGs.
It has been reported that soybean lpa mutants regulate cell wall synthesis 35 . The enzyme β-glucosidase is associated with the cell wall 42,43 . β-Glucosidase in the leaves of Arabidopsis may play a vital role in the breakdown of cell wall polysaccharides, which in turn provide soluble sugar for remobilization and completion of the energy-dependent senescence process 44 . In this study, 13 DEGs encoding β-glucosidase were upregulated in the late stage of seed development in 2mlpa compared with 2MWT. It is interesting that loss of photosynthesis is accompanied by a signi cant increase in the activity of cell wall-bound b-glucosidase in Arabidopsis 45 , which is consistent with downregulation of DEGs related to photosynthesis and upregulation of DEGs encoding β-glucosidase in 2mlpa soybean seeds. The clear role of βglucosidase in 2mlpa mutants lies outside of the scope of this study; however, it is possible that 2mlpa mutants may cause the breakdown of cell wall polysaccharides.
DEGs directly associated with sucrose biosynthesis encoded two enzymes: sucrose-phosphate synthase (SPS; EC 2.4.1.14), which catalyzes the biosynthesis of sucrose 6-phosphate from UDP-glucose and fructose 6-phosphate, and sucrose synthase (SuSy; EC 2.4.1.13) 46,47 . There was one up-regulated DEG (Glyma.13G161600) encoding SPS and two upregulated DEGs (Glyma.09G073600, Glyma.15G182600) encoding SuSy in 2mlpa at the late seed developmental stage (stage 5). These up-regulated DEGs may result in high sucrose content in 2mlpa soybean seeds. Another lpa soybean line, V99-5089 (mutation in the MIPS1 gene), also produces soybean seeds with low PA and high sucrose contents 48 . It is possible that a mutation in MIPS1 or low PA may improves the sucrose content of soybean seeds, which is consistent with the high-sucrose phenotype of lpa mutant seeds 30 .

DEGs involved in defense mechanisms
DEGs related to plant-pathogen interaction (ko04626) and responses to stress (GO:0006950) were enriched in 2mlpa at the ve seed developmental stages, especially in the early (stage 1) and late (stage 5) stages. In total, 604 DEGs were related to defense mechanisms, and most of these were exclusively regulated at one stage ( Figure 5E, Supplementary Table   S7). For instance, most of the DEGs (105 of 146 DEGs) enriched at early seed developmental stages were not differentially expressed at the late seed developmental stage (stage 5); meanwhile, most of the DEGs (366 of 447 DEGs) enriched at stage 5 were only differentially expressed at this stage. These results suggest that the genes affected by the lpa mutations are different at each developmental stage.
PA is a natural antioxidant 49 . In plants, PA is also a key signaling molecule generated in response to the drought stress hormone abscisic acid, which stimulates Ca 2+ release in guard cells 50 . Furthermore, PA is important for the maintenance of basal resistance to plant pathogens 51 . InsP3 and InsP4 are intermediary metabolites in PA metabolism that act as second messengers to regulate cytosolic Ca 2+ levels 52 . Hence, the reduction of PA and increase in InsP3 and InsP4 in lpa soybean may cause a series of complex chain reactions in defense mechanisms 8,30 . Here, 604 DEGs were related to defense mechanisms, with about a third of these DEGs encoding plant disease resistance (R) proteins including LRR receptor-like serine/threonine-protein kinase, calmodulin, transcription factors, peroxidases, and mitogen-activated proteins kinases (MAPKs).
Plant disease resistance (R) proteins are important components of the genetic resistance mechanism in plants, and several R genes confer resistance to a wide spectrum of plant pathogens 53 . In this study, 64 R genes were up-regulated in 2mlpa compared with 2MWT at the late developmental stage. The most prevalent domain of the characterized R proteins is the leucine-rich repeat (LRR), which is the major determinant of pathogen recognition. The LRR receptor-like R protein consists of an extracellular LRR domain and a transmembrane motif 54 . Here, 21 DEGs encoding LRR receptor-like serine/threonine protein kinase FLS2 were up-regulated at the late stage of seed development. In Arabidopsis, FLS2 is a transmembrane receptor kinase that activates antimicrobial defense responses 55 . Brassinosteroid insensitive 1-associated receptor kinase 1 (BAK1) is also an LRR receptor-like protein kinase and is involved in signaling by FLS2 56 . Plants affected at BAK1 exhibit reduced responses to various microbial elicitors and cold shock protein 56,57 . In this study, 28 DEGs encoding BAK1 were up-regulated in 2mlpa at stage 5. The role of LRR genes in 2mlpa soybean seed development is still unknown, and further studies are required to con rm the association between the up-regulation of LRR genes and the lpa soybean line 2mlpa.
Additionally, some NAC, MYC2, and WRKY transcription factor family genes that regulate defense-related genes were also differentially expressed at the late stage of seed development (stage 5) in the 2mlpa mutant. For example, two DEGs encoding WRKY70 were up-regulated. WRKY70 negatively modulates cell wall-associated defenses against pathogens and leaf senescence in Arabidopsis [58][59][60] . MAPKs are key enzymes mediating adaptive responses to various abiotic and biotic stresses 61 . Here, 10 DEGs encoding MAPKs were differentially regulated between 2mlpa and 2MWT at the ve seed stages. These defense-related DEGs may be affected by changes in PA or its intermediary metabolites. Interactions between plants and pathogens involve bidirectional recognition. Upon pathogen attack, plant defense responses involve modulation of the expression of a large number of genes 62 . In total, 418 DEGs were involved in "plant-pathogen interaction" in 2mlpa soybean seeds, suggesting that MIPS1and IPK1-induced processes share some common features with the defense mechanism against pathogens in plants. Overall, our results verify the correlation between PA and defense mechanisms.
In conclusion, these results contribute to our further understanding of the metabolic regulatory network of the 2mlpa mutant during seed development.

Genetic material and background
A homozygous double-mutant soybean line with low PA, designated 2mlpa (mips1/ipk1), and a sibling line with homozygous non-mutant alleles and normal PA, designated 2MWT (MIPS1/IPK1), were produced in the present study. These lines were developed by crossing Gm-lpa-TW-1 with Gm-lpa-ZC-2. Gm-lpa-TW-1 was developed using gamma irradiation of 'Taiwan 75' (a vegetable soybean cultivar widely grown in Zhejiang Province), which produced a 2-bp deletion in GmMIPS1 12 . Similarly, the Gm-lpa-ZC-2 soybean line carries a G-A point mutation in the GmIPK1 gene compared with that of its commercial wild-type cultivar, 'Zhechun No. 3' 15 . After crossing, F 1 plants were grown and harvested in individuals. F 2 plants (as well as the original crossing parents) were grown, and genotyped by high inorganic phosphorus testing and high-resolution melting curve (HRM) analysis 12,15 , and classi ed into homozygous wild-type (HWT), homozygous lpa mutant (HM), and heterozygous ones. The F 3 seeds from ve heterozygous F 2 individual were planted and again genotyped for identi cation of heterozygous F 3 plants. The F 4 seeds of ve individual heterozygous F 3 plants were grown into F 4 populations for identi cation of homozygous wild-type and lpa mutant plants (Figure 1). The selection process of F 5 -F 6 plants was similar to that of F 3 -F 4 generation. F 7 populations were developed from ve heterozygous F6 individual and segregated, and their genotypes were determined HWT and HM seeds from total 10 F 7 plants were used to produce 5 HWT and 5 HM F 8 lines which were used in our research, the genotypes of which were determined by HRM analysis for the selection of lpa and non-lpa lines 63 . The PA and inositol phosphate isomers content of seeds of F 8 lines was evaluated and 2.32mg/g and 18.2mg/g PA content were detected in 2mlpa and 2WMT lines, respectively 16 .

Library construction and sequencing
The quality of total RNA was assessed using a 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA) and checked using agarose gel electrophoresis. The nal concentration of all RNA samples was adjusted to 500 ng/μl after quanti cation.
High-quality RNA samples were used for library preparation. In total, 30 cDNA libraries were generated using a SMART™ cDNA Library Construction Kit (Takara Bio-tek, Inc., Japan) and tested using an Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System (Thermo Fisher, USA). Raw reads were generated on the Illumina Hi Seq 4000 platform (Illumina, USA) at iGENE Biological Technology (Hangzhou, China).

Sequence assembly
Raw reads were ltered by removing low-quality sequences and rRNA reads. Assembly of the clean reads was then accomplished using SOAPaligner/SOAP2 64 . The comparison software Tophat 65 (http://ccb.jhu.edu/software/tophat/index.shtml) was used to compare the clean data from each sample with the reference genome (Glycine_max_v2.0). Mismatches of no more than ve bases were allowed in the alignment.
3.5. Gene annotation and analysis of differentially expressed genes (DEGs) All genes were analyzed by sequence alignment to the NR (NCBI non-redundant protein), Swiss-Prot, GO (Gene Ontology), COG (Clusters of Orthologous Groups) and KEGG (Kyoto Encyclopedia of Genes and Genomes) databases using BLAST2GO (https://www.blast2go.com/). Gene expression levels were computed and normalized as reads per kb per million fragments (RPKM). The RPKM method can eliminate the in uence of gene length and sequencing quantity on the expression data of genes. The calculated gene expression levels can be used directly to compare differences in gene expression between different products. Differential gene expression was analyzed using the DESeq function in Bioconductor (http://www.bioconductor.org). The false discovery rate (FDR) was obtained through correcting the P value using the Benjamini-Hochberg correction method 66

Reverse-transcription quantitative PCR (RT-qPCR)
To validate the data obtained from RNA-Seq, RT-qPCR was performed on eight DEGs. First-strand cDNA was synthesized from high-quality total RNA (see above) using a RevertAid First Strand cDNA Synthesis Kit (Thermo sher Scienti c Inc., Table S1) were designed using Primer-Blast from the NCBI (https://www.ncbi.nlm.nih.gov/tools/primer-blast/). The housekeeping gene ACT11 was used to normalize expression levels of the selected genes. All primers are shown in Supplementary Table S1. RT-qPCR analysis of the eight genes was conducted in triplicate using SYBR Green Real-time PCR Master Mix (Toyobo, Osaka, Japan) on a LightCycler® 480 Real-Time PCR System (Roche Diagnostics GmbH, Mannheim, Germany). Relative gene expression levels were determined using the corrected relative 2 -ΔΔCT method.

Data Availability Statement
All raw data are published in the NCBI (http://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA522338. Other data included in this study are available upon request by contact with the corresponding author.