High-Resolution Mapping in Two RIL Populations Refines Major “QTL Hotspot” Regions for Seed Size and Shape in Soybean (Glycine max L.)

Seed size and shape are important traits determining yield and quality in soybean. However, the genetic mechanism and genes underlying these traits remain largely unexplored. In this regard, this study used two related recombinant inbred line (RIL) populations (ZY and K3N) evaluated in multiple environments to identify main and epistatic-effect quantitative trait loci (QTLs) for six seed size and shape traits in soybean. A total of 88 and 48 QTLs were detected through composite interval mapping (CIM) and mixed-model-based composite interval mapping (MCIM), respectively, and 15 QTLs were common among both methods; two of them were major (R2 > 10%) and novel QTLs (viz., qSW-1-1ZN and qSLT-20-1K3N). Additionally, 51 and 27 QTLs were identified for the first time through CIM and MCIM methods, respectively. Colocalization of QTLs occurred in four major QTL hotspots/clusters, viz., “QTL Hotspot A”, “QTL Hotspot B”, “QTL Hotspot C”, and “QTL Hotspot D” located on Chr06, Chr10, Chr13, and Chr20, respectively. Based on gene annotation, gene ontology (GO) enrichment, and RNA-Seq analysis, 23 genes within four “QTL Hotspots” were predicted as possible candidates, regulating soybean seed size and shape. Network analyses demonstrated that 15 QTLs showed significant additive x environment (AE) effects, and 16 pairs of QTLs showing epistatic effects were also detected. However, except three epistatic QTLs, viz., qSL-13-3ZY, qSL-13-4ZY, and qSW-13-4ZY, all the remaining QTLs depicted no main effects. Hence, the present study is a detailed and comprehensive investigation uncovering the genetic basis of seed size and shape in soybeans. The use of a high-density map identified new genomic regions providing valuable information and could be the primary target for further fine mapping, candidate gene identification, and marker-assisted breeding (MAB).


Introduction
Soybean (Glycine max L.) is one of the most economically important crops, being a rich source of both edible oil and protein, and can fix atmospheric nitrogen through a symbiotic association with microorganisms in the soil, and are used as a model plant for legume research [1]. However, over
Overall, 38 QTLs were associated with three different seed shape traits in both the K3N and ZY populations; out of them, 20 QTLs have been reported for the first time, while earlier studies have already reported the remaining 18 QTLs (Table 2). Moreover, 17 out of 38 QTLs were major, with R 2 > 10%, and four of them, viz., qSLW-6-1 ZY , qSLW-20-1 K3N , qSLT-10-1 ZY, and qSLT-20-1 K3N, were detected stably in more than one individual environment. The most prominent major and stable QTL was qSLW-20-1 K3N (novel QTL), with the highest LOD value of 9.01 in an individual environment, identified at 53.61 cM position on Chr20 and expressing a PV of 26.84% ( Table 2). The 16 QTLs have positive additive effects with beneficial alleles inherited from KeFeng35, whereas the remaining 22 QTLs possess negative additive effects with favorable alleles derived from Nannong1138-2 (Table 2). Table 2. M-QTLs identified for three seed-shape traits (seed length-to-width (SLW), seed length-to-thickness (SLT), and seed width-to-thickness (SWT)) in ZY and K3N RIL populations across multiple environments.

MCIM Mapping and Comparison of CIM and MCIM Methods
To further validate the QTLs detected by CIM, we performed another method of mixed-modelbased composite interval mapping (MCIM) to dissect the additive effect QTLs and QTL x E interactions. By using the MCIM method, we identified a total of 48 additive effect QTLs distributed on 15 chromosomes for all six traits related to seed size and shape in both the RIL populations and all three environments, which expressed 1.69 to 29.35% of the PV (Table 3). Moreover, the additive effect of different QTLs was either negative or positive; for example, 30 and 18 QTLs have positive and negative additive effects, respectively. Hence, indicating that both parents contribute beneficial alleles for seed size and shape traits in ZY and K3N populations ( Table 3). Out of 48 QTLs, 10 QTLs were significant, with R 2 > 10%, whereas the remaining 38 QTLs were minor, with R 2 < 10% (Table 3).
Among these 48 QTLs, 15 QTLs showed significant additive by environment interaction (AE) effects (Table 3). However, four QTLs viz., qSL-13-4 ZY , qSW-13-3 ZY , qST-13-4 ZY , and qST-10-1 K3N revealed AE effect at all environments, while seven and four QTLs showed AE effect in two and one specific environments, respectively ( Table 3). The AE effect of these 15 QTLs associated with seed size and shape traits could express the PV that varies from 0.01 to 4.15%. The remaining 33 QTLs identified through the MCIM approach do not possess any AE effect; hence, they are environmentally stable QTLs (Table 3).
Lastly, we performed a comparative analysis of QTLs detected by CIM and MCIM approaches. A total of 88 and 48 QTLs were identified by CIM and MCIM, respectively. Among these QTLs, 15 QTLs were common and are detected by both methods in the same physical genomic interval, indicating the reliability and stability of these QTLs. Besides, by comparing the physical genomic regions of QTLs identified in both populations (ZY and K3N) and mapping methods (CIM and MCIM), two QTLs, viz., qSW-1-1 ZY and qSLT-20-1 K3N , were detected in common, with R 2 > 10%, identified for the first time. Hence, these QTLs were considered as the most stable and novel QTLs that could be utilized potentially for gene cloning and MAB of soybean seed size and shape traits.

Colocalization of QTLs in QTL cluster/Hotspot
A QTL cluster/hotspot is defined as a densely populated QTL region of the chromosome that contains multiple QTLs associated with various traits. In this study, we observed colocalization of QTLs on four QTL Clusters/hotspots located on different chromosomes, viz., Chr6, Chr10, Chr13, and Chr20, and were named Cluster-06/QTL Hotspot A, Cluster-10/QTL Hotspot B, Cluster-13/QTL Hotspot C, and Cluster-20/QTL Hotspot D, respectively ( Table 5). The highest concentration of QTLs for seed size and shape traits was identified in "QTL Hotspot A" of Chr06, spanning the physical interval of 2.19Mb (Figure 3). This QTL hotspot harbors six QTLs (three major and three minor), viz., qSW-6-1 ZY , qST-6-1 ZY , qSL-6-1 ZY , qSW-6-2 ZY , qST-6-2 ZY , and qSLT-6-1 ZY , associated to seed size and shape traits, expressing a PV of 5.43-15.35% (Table 5). Another set of QTL-rich regions possessing five QTLs (two major and three minor), viz., qSL-13-1 ZY , qSW-13-1 ZY , qST-13-2 ZY , qST-13-3 ZY , and qSLW-13-1 ZY was "QTL Hotspot C", with a length of 6.3 Mb (Table 5 and Figure 3). However, both "QTL Hotspot B" and "QTL Hotspot D" contain three QTLs each associated with studied traits and spanning the physical interval of 4.0Mb and 2.3Mb expressed PV of 6.60-17.03% and 10.22-26.84%, respectively (Table 5). Furthermore, all these four "QTL cluster/hotspots" comprise many significant QTLs identified in more than one individual environment. QTLs within "QTL Hotspot B" were identified in both ZY and K3N populations (Table 5 and Figure 3). Hence, these four major "QTL hotspots" are the stable genomic regions governing the inheritance of seed shape and size in soybeans. spanning the physical interval of 4.0Mb and 2.3Mb expressed PV of 6.60-17.03% and 10.22-26.84%, respectively (Table 5). Furthermore, all these four "QTL cluster/hotspots" comprise many significant QTLs identified in more than one individual environment. QTLs within "QTL Hotspot B" were identified in both ZY and K3N populations (Table 5 and Figure 3). Hence, these four major "QTL hotspots" are the stable genomic regions governing the inheritance of seed shape and size in soybeans.

Candidate Gene Mining within Major "QTL Hotspots"
The whole-genome sequence and gene annotations availability makes it possible to identify possible candidate genes within major genomic regions. In the present study, all the model genes along with their gene annotations were downloaded from Phytozome and Soybase. In total, we identified 2406 gene models within the physical genomic interval of all four major "QTL hotspots" (Table S4). An online web-based toolkit agriGO V2.0 was used for a gene ontology (GO) enrichment analysis to visualize the biological process, molecular function, and cellular component main categories ( Figure 4). Among all the genes present within the four "QTL hotspots", only the 831, 193, 192, and 118 genes from "QTL Hotspot A", "QTL Hotspot B", "QTL Hotspot C", and "QTL Hotspot D", respectively, had GO annotations available (Figure 4). In all the four major "QTL hotspots", a higher percentage of genes were associated within the terms cellular process, metabolic process, cell part, cell, catalytic activity, and binding (Figure 4), suggesting a vital role of these terms in regulating seed size and shape in soybeans.
Based on the gene annotations, available literature, and GO enrichment analysis, we predicted 26, 19, 35, and 18 candidate genes from "QTL Hotspot A", "QTL Hotspot B", "QTL Hotspot C", and "QTL Hotspot D," respectively (Table S6). These genes function directly or indirectly in regulating seed development, as well as seed shape and size, such as mitotic cell division, storage of proteins and lipids, transport, metabolic process, signal transduction of plant hormones, degradation of the ubiquitin-proteasome pathway, and fatty acid beta-oxidation (Table S6). To further refine the above-predicted candidate genes list, we retrieved RNA-Seq data of these candidate genes from Soybase (www.soybase.org) [35].
Based on the gene annotations, available literature, and GO enrichment analysis, we predicted 26, 19, 35, and 18 candidate genes from "QTL Hotspot A", "QTL Hotspot B", "QTL Hotspot C", and "QTL Hotspot D," respectively (Table S6). These genes function directly or indirectly in regulating seed development, as well as seed shape and size, such as mitotic cell division, storage of proteins and lipids, transport, metabolic process, signal transduction of plant hormones, degradation of the ubiquitin-proteasome pathway, and fatty acid beta-oxidation (Table S6). To further refine the abovepredicted candidate genes list, we retrieved RNA-Seq data of these candidate genes from Soybase (www.soybase.org) [35].

Glyma06g04810
Seed coat development extracellular region

Glyma06g03700
Seed development ovule development

Glyma06g02790
Response to auxin stimulus

Glyma06g07200
Response to ethylene stimulus ubiquitin-protein ligase activity Glyma06g09650 microtubule nucleation, response to auxin stimulus, cytokinin mediated signaling pathway

Glyma06g10700
Response to brassinosteroid stimulus sterol biosynthetic process

Cluster-10/QTL Hotspot B Glyma10g35360
Response to auxin stimulus cellular component

Glyma13g17750
Response to auxin stimulus protein dimerization activity

Glyma13g17980
Embryo development

Glyma13g21770
Endosperm development embryo development

Glyma13g21700
Response to ethylene stimulus-response to auxin stimulus

Glyma13g22790
Protein kinase activity

Glyma20g28550
Seed maturation protein

Glyma20g27300
Lipid metabolic process seed maturation cell growth

Glyma20g29750
Multidimensional cell growth polysaccharide biosynthetic process regulation of hormone levels

Glyma20g30100
Embryo development seed development protein phosphorylation

Discussion
Seed shape and size is an economically important trait determining the yield and quality in soybeans. Hence, developing soybean cultivars with improved seed shapes and sizes is considered as a critical objective of soybean breeding programs. However, to develop improved cultivars, it is a prerequisite to have a detailed understanding of genetic architecture, as well as a mechanism underlying the trait of interest. Both seed shape and size are complex quantitative traits, governed by multiple genes and are highly environmentally sensitive. Although, over the past decades, many QTLs related to soybean seed shape/size have been reported but not stable and confirmed, due to small-sized mapping populations and low-density genetic maps, and, hence, not implied for breeding improved seed shapes and sizes in soybean. Thus, the present study aimed to utilize high-density intraspecific linkage maps of ZY and K3N populations, evaluated in three different environments, to identify the stable significant main-effect QTLs, "QTL Hotspots", and epistatic-effect QTLs, as well as their interactions with the environment; additionally, find possible candidate genes for soybean seed sizes and shape traits. In this study, ANOVA results revealed a significant difference among the RILs of both ZY and K3N populations for all six traits (P < 0.01, Table S2). Similar to previous studies, our study also reported that all six traits related to seed size/shape were significantly affected by G, E, and G × E [7,36]. Frequency distribution of all six traits (SL, SW, ST, SLW, SLT, and SWT) showed the characteristics of continuous variations, and all these traits have transgressive segregation in both directions, which indicates that both parents contributed favorable alleles for these traits (Figure 1). These findings are in agreement with the prior findings, which also stated continuous distribution and transgressive segregation for seed size/shape traits among RILs of soybeans in multiple environments [3,17,22]. In our study, the estimated heritability of all six traits was high (>60%) in both RIL populations across all three environments (Table S1), which was consistent with previous studies [7]. The high heritability suggests that if the trial repeated in the same growing/environment conditions, there would be a high possibility of achieving the same kind of phenotypic results. A highly significant correlation (either positive or negative) between any two seed shapes or seed size traits and between seed size and shape traits is in accordance, as previously reported by Xu et al. [5].
QTL mapping is a practical approach and has been frequently employed for the detection of QTLs/genes underlying the quantitative traits in crop plants. However, the efficiency and accuracy of QTL mappings are influenced, mainly by parental diversity and marker density [26]. The quality of genetic maps has a significant impact on the accuracy of QTL detection, and, consequently, increasing marker density can intensify the resolution of QTL mapping [37]. Hence, it is a prerequisite to utilize high-density linkage maps for improving the efficiency and accuracy of linkage mapping and MAS. In this study, high-density genetic maps of ZY and K3N populations were used, consisting of 3255 SLAF and 1733 bin markers, respectively. The markers in both linkage maps, viz., ZY and K3N, were integrated to all 20 LGs and covered the total length of 2144.85 and 2362.44 cM, respectively, with an average distance between adjacent markers of 0.66 cM and 1.36 cM, respectively.
The use of high-density bin-maps assisting in QTL identification with tightly linked markers provided a good foundation for analyzing quantitative traits. However, to reduce environmental errors, RILs were planted in three environments (consisting of different locations and years), and each of the environments was statistically different. Jansen et al. [38] described that the QTL position and effects could be accurately evaluated if the phenotypic data collected in various environments were different from a statistical perspective. Although, markers associated with the QTLs regulating the seed sizes and shapes in soybeans have been mapped on all linkage groups (Soybase, www.soybase.org). However, for cross-validation and improving the accuracy of QTL mapping results, we used two different methods for QTL mapping, viz., CIM and MCIM. A total of 88 and 48 QTLs were detected by CIM and MCIM methods, respectively, associated with all six traits related to seed size and shape (Tables 1-3). Among these QTLs, 15 common QTLs were verified through both CIM and MCIM, indicating that these QTLs were stable and utilized effectively as potential candidate regions for enhancing seed sizes and shapes in soybeans. The QTL results of our study revealed better matches with the SoyBase database (www.soybase.org; Tables 1 and 2); however, 51 (CIM) and 27 (MCIM) QTLs were identified for the very first time (Tables 1-3). These novel QTLs collectively expressed more than 90% of PV for seed size and shape, suggesting their potential value for the development of improved soybean cultivars. Among these novel QTLs, qSL-9-1 ZY , K3N , qST-6-2 ZY , qSLW-6-1 ZY , qSLW-20-1 K3N , qSLT-10-1 ZY , and qSLT-20-1 K3N were reported as stable and major QTLs, identified in more than one individual environment, with R 2 > 10%. Besides, by comparing the physical genomic regions of QTLs identified in both populations (ZY and K3N) and mapping methods (CIM and MCIM), two major and novel QTLs, viz., qSW-1-3 ZY and qSLT-20-3 K3N , were characterized commonly in both mapping methods. These above seven unique and stable QTLs significantly represent potential loci for the improvement of seed sizes and shapes in soybeans. Hence, identification of many new and unique QTLs in the present study suggests distinct genetic architecture in the population derived from the diverse Chinese cultivated soybean genotypes and the need to use more germplasm for revealing the complex genetic basis of soybeans. The favorable alleles for seed size and shape traits were contributed by both parents of two RIL populations, viz., ZY and K3N. Therefore, it is critical to note that not only the higher phenotype parent contributes beneficial alleles but also the contribution of favorable alleles by lower phenotype parents cannot be disregarded; similar results are also described earlier [30].
The stability of the QTL is essential for use in a breeding program. In addition to novel stable QTLs identified for both seed size and shape traits, this study also identified 37 and 21 QTLs through the CIM and MCIM methods, which have been previously colocalized in the same physical interval by earlier studies (see references in Tables 1-3). Out of these colocalized QTLs, 12 and 3 QTLs detected by the CIM and MCIM methods were major (R 2 > 10%). Therefore, our results showed the reliability of QTL mapping. Furthermore, these QTLs can be utilized as principal targets to identify the candidate genes and MAS in future studies.
It has been demonstrated that epistatic and QTL by environment interaction effects are the two crucial genetic factors that make an enormous contribution to the phenotypic variation observed in complex traits, and the knowledge of those interaction effects is vital for understanding the genetic mechanism of complex traits [39,40]. Previous studies revealed that the seed sizes and shapes of soybeans is significantly affected by the environment [36]. Moreover, knowledge of specific QTL by environment interactions can guide the search of varieties adapted to particular environments. The QTLs with more significant additive effects are often considered more stable [41,42]. For example, the qSW-1-1 ZY and qST-18-2 ZY (additive effect: 0.14) identified in both CIM and MCIM methods; though, qSLT-6-5 ZY (additive effect: 0.001) was detected only in the MCIM method (ZY only) ( Table 3). The genetic architecture of seed size and shape also includes epistatic interactions between QTLs [11,43]. Hence, ignoring intergenic interactions will lead to the overestimation of individual QTL effects, and the underestimation of genetic variance [44], consequently, could result in a substantial drop in the genetic response to MAS, particularly at late generations [45]. In the present study, 16 pairs of digenetic epistatic QTLs pairs were identified for seed size and shape in both populations and expressed phenotypic variations that varied from 1.71 to 9.70% (Table 4). Except for qSL-13-3 ZY , qSL-13-4 ZY , and qST-13-4 ZY , all the remaining epistatic QTLs do not possess additive effects alone, suggesting that these loci might serve as modifying genes that interact with other genes to affect the phenotypes of seed sizes and shapes (Table 4). All 16 pairs have significant AA, but only four QTL pairs, viz., qSL-2-1 K3N and qSL-2-2 K3N , qST-9-1 K3N and qST-12-4 K3N , qSLT-2-3 K3N and qSLT-7-1 K3N, and qSWT-6-1 K3N and qSWT-8-4 K3N, hold significant AAE interaction effects. However, the total AAE phenotypic variations expressed by these four epistatic pairs was 19.14%. These results show that epistatic and environmental interactions are fundamental for understanding the genetic basis of seed sizes and shapes in soybeans, demonstrating that these effects should be considered in a QTL mapping program and could increase the accuracy of phenotypic value predictions in MAS.
Colocalization of QTLs on chromosomes for different traits related to seed size and shape were also observed in this study. This colocalization of QTLs linked to related traits on chromosomes was reported earlier in soybeans and referred to as "QTL cluster/hotspots" [46]. In this study, we scrutinized a few genomic regions containing QTL clusters and found four QTL clusters/hotspots on four different chromosomes, viz., Chr06, Chr10, Chr13, and Chr20 ( Figure 3 and Table 5). The QTLs within each cluster/hotspot are associated with three or more traits related to seed sizes and shapes in soybeans. The highest number of six and five QTLs were observed in "QTL Hotspot A" and "QTL Hotspot C", respectively, harboring QTLs for more than three traits related to seed size and shape (Figure 3 and Table 5). The other two hotspots, viz., "QTL Hotspot B" and "QTL Hotspot D", contain three QTLs, each for three different traits related to seed size and shape (Table 5). These QTLs clusters/hotspots have not reported and added to the growing knowledge of the genetic control of these traits. The phenomenon of the QTL clustering might represent a linkage of genes/QTLs or result from the pleiotropic effects of a single QTL in the same genomic region [47]. This colocalization of QTLs for different seed size and shape traits was following the fact that they were highly significantly correlated with each other (Table 1). These "QTL hotspot" regions showed that the QTLs linkage/pleiotropy could facilitate the enhancement of seed size and shape. Previously, some of the QTLs for other traits have also been identified in the same region of "QTL Hotspot A" on chromosome 06, which are related to seed oil and protein content [48,49] and days to flowering [50]. In the case of "QTL Hotspot B", QTLs related to seed weight and seed yield [51], length of the reproductive stage [33], days to flowering, and maturity [33] were reported in the same physical interval.
Similarly, earlier studies have also reported QTLs for seed weight [7] and seed volume [33] in the "QTL Hotspot C" region on Chr13. In "QTL Hotspot D", QTLs related to seed maturity [33] and seed oil content [52] have formerly reported. Seed oil and protein content in soybeans have reported a significant correlation with seed size and shape [53], as seed oil and protein content represents a major component of soybean seeds, representing 38-42% and 18-22%, respectively; hence, these traits are directly related to seed sizes and shapes in soybeans [13]. Both seed size and shape are important yield component traits [54] and it has been reported that days to flowering and maturity is directly correlated to yield in soybeans [55,56], signifying the potential probability of common genic factors for these traits and also showing the necessity to promote further study for these regions. These QTL clusters have provided some valuable information to define genome regions associated with different traits. Based on the comprehensive analysis of clusters in this study, breeding programs targeting an increase of seed sizes and shapes with high yield and superior quality can focus on hotspot clustering and select QTLs around the region. Finally, the existence of QTL clusters/hotspots has provided proof that genes related to some crop traits are more densely concentrated in certain genomic regions of crop genomes than others [33,51].
Identification of candidate genes underlying the QTL region is of great interest for practical plant breeding. Earlier studies based on QTL mapping of seed size and shape did not practice mining for candidate genes [22,54], and, to date, only a few seed size/shape-related genes have been isolated from the soybean. For example, the Ln gene has a large effect on the number of seeds per pod and seed size/shape [57], and, recently, the PP2C-1 (protein phosphatase type-2 C) allele from wild soybean accession ZYD7 were found to contribute toward the increase in seed size/shape [58]. Based on the gene annotations, available literature, and GO enrichment analysis, the present study identified the possible candidate genes regulating the seed sizes and shapes in soybeans that underlies the four categorized "QTL hotspots". Gene ontology (GO) analysis revealed that most of the genes underlying the above four "QTL hotspots" belong to the terms cellular process, metabolic process, cell part, cell, catalytic activity, and binding, and these elements are reported as being vital in seed development [59][60][61]. A total of 2406 gene models were mined within the physical interval of the four "QTL hotspots." Out of them, 88 were considered as possible candidate genes, based on the GO enrichment analysis, gene function, and available literature. These 88 predicted candidate genes have functions that are directly or indirectly involved in seed development, influencing the shape and size of seeds, such as lipid storage, transport and metabolic processes, signal transduction of plant hormones, degradation of the ubiquitin-proteasome pathway, fatty acid beta-oxidation, the brassinosteroid-mediated signaling pathway, and the auxin biosynthetic process (Table S5). From the available gene expression data (RNA-seq), 23 of the 88 predicted candidate genes expressed significantly higher gene expression, particularly in seed development stages, root nodules, leaf, and pod shell ( Figure 5 and Table S5). Out of these 23 genes, five genes, viz., Glyma06g04810, Glyma06g03700, Glyma13g17980, Glyma13g21770, and Glyma20g30100 have functions that are related to seed development, ovule development, endosperm, and embryo development, which have been reported to directly contribute to seed sizes and shapes in crop plants, including soybeans [62,63]. Likewise, Glyma06g02390, Glyma06g06160, Glyma06g07200, and Glyma13g18730 encode RING/U-box superfamily proteins/protein ubiquitination. The ubiquitin pathway has recently been known to play an essential part in seed size determination [60]. Several factors involved in ubiquitin-related activities have been revealed to determine seed sizes in Arabidopsis and rice [60]. Genes, viz., Glyma06g08290, Glyma20g28460, Glyma20g28640, Glyma20g29750, Glyma20g28550, Glyma13g22790, Glyma20g29750, and Glyma20g27300, function in lipid storage, seed maturation, and cell growth, which have formerly been reported to determine seed size and shape in oilseeds, including soybeans [64]. For example, overexpression of GmMYB73 promotes lipid accumulation in soybean seeds, which leads to increased seed sizes in soybeans [65]. Genes, viz., Glyma06g09650, Glyma10g35360, Glyma10g36440, Glyma13g17750, and Glyma13g21700, are involved in auxin biosynthesis, responses to auxin stimulus, and responses to ethylene stimulus. The auxin regulates seed weights and sizes in Arabidopsis [22,66]. Glyma06g10700 functions to regulate the brassinosteroid stimulus, which positively governs seed size [62]. Hence, based on the gene function, GO, and RNA-Seq analysis, the above 23 genes were considered as the most potentially possible candidate genes for regulating the seed sizes and shapes in soybeans. However, it requires further validation and verification to confirm their actual roles in seed sizes/shapes in soybeans, as well as their future uses for the improvement of seed quality traits. Some of these genes were already included in our ongoing project for functional validation to ascertain their effects on the seed sizes and shapes. Hence, the precise identification of QTLs in a specific physical interval through the use of a high-density map would make it easy to identify candidate genes.

Plant Material and Experimental Conditions
In the present study, two related RIL populations, viz., ZY and K3N, consisting of 236 and 91 lines, respectively, were used for elucidating the genetic basis of seed shapes and sizes in soybeans. The ZY and K3N populations were derived through a single seed descent (SSD) method by crossing a common higher seed size parent Nannong1138-2 (N) with two cultivated soybean varieties, viz., Zhengxiaodou (Z) and KeFeng35 (K3), having smaller seed sizes [67]. All the plant material was received from Soybean Germplasm Gene Bank, located at the National Centre for Soybean Improvement (Ministry of Agriculture), Nanjing Agricultural University, Nanjing, China. The In each environment, standard cultural and agronomic practices were trailed, as previously described [68,69].

Phenotypic Evaluation and Statistical Analysis
For the phenotypic assessment of seed size and shape, we collected seeds from the randomly selected ten plants harvested from the middle of each block across three different environments (2012JP, 2012FY, and 2017JP) in both RIL populations. The seed size traits include seed length (SL), seed width (SW), and seed thickness (ST), whereas seed shape was assessed using three different ratios, viz., seed length/seed width (SLW), seed length/seed thickness (SLT), and seed width/seed thickness (SWT). The SL, SW, and ST were measured in millimeters (mm) using the vernier caliper instrument, according to Kaushik et al. [39]. However, SLW, SLT, and SWT were estimated from the individual values of the SL, SW, and ST, respectively, by following Omokhafe and Alika [41].
Descriptive statistics, such as mean, range (maximum and minimum values), coefficient of variation (CV%), skewness, and kurtosis for above seed size and shape traits in both RIL populations, including their parents, were calculated using the SPSS17.0 software (http://www.spss.com) [42]. For each environment, an analysis of variance (ANOVA) was carried out using a generalized linear model (GLM) program of SAS PROC (SAS Institute Inc. v. 9.02, 2010, Cary, NC, USA). The ANOVA for the combined environment (CE) was also performed in SAS software using mixed PROC with random factors: lines, environments, replication within environments, and the line-by-environment interaction. Pearson correlation coefficient (r) among traits was calculated from the average data using PROC CORR in combined environments. The broad-sense heritability (h 2 ) in RIL populations was estimated using the following equation: where σ 2 G is the genotypic variance, σ 2 GE is the variance of the genotype-by-environment interaction, σ 2 e is the error variance, n is the number of environments, and r is the number of replications within an environment [44].

SNP Genotyping and Bin Map Construction
Genetic map construction began with the extraction of DNA from the young and fresh leaves of both RIL populations, along with their parents, by following the protocol of Zhang et al. [45]. DNA library construction, high-throughput sequencing (RAD-Seq), high-quality SNP acquisition, and SLAF/bin marker integration for ZY and K3N populations, respectively, were performed as described by Huang et al. [70] and Cao et al. [30]. These SLAF and bin markers were employed to develop the linkage maps of the ZY and K3N populations, respectively, using JoinMap 4.0 [71]. High-density genetic maps of the ZY and K3N populations contained 3255 SLAF and 1733 bin markers, respectively. The total length of the ZY and K3N maps were 2144.85 and 2362.44 cM, with an average distance between the adjacent markers as 0.659 and 1.36 cM, respectively (Table S7). The average length of each linkage group was 162.75 and 86.65 cM for ZY and K3N linkage maps, with the mean marker density of each linkage group as 107.24 and 118.122, respectively (Table S7).

QTL Mapping for Seed Size and Shape
For QTL analysis, we used the WinQTLCart 2.5 software [47] and QTLNetwork 2.2 [72]. The model of composite interval mapping (CIM) was used to identify the main-effect QTLs (M-QTLs) with a 10 cM window at a walking speed of 1cM for the WinQTLCart 2.5 software. The LOD threshold was premeditated using 1000 permutations for an experimental-wise error rate of P = 0.05 to determine whether the QTL was significantly associated with the traits [73]. The model of mixed linear composite interval mapping (MCIM) was applied to identify significant additive effect QTLs, epistatic QTLs (AA), genotype-by-environment interaction effects (additive by the environment (AE) and AA by the environment (AAE)) in the QTLNetwork 2.2 [74]. The physical location of M-QTLs on each chromosome were drawn by using MapChart 2.1 software [75].
QTLs were named by following standard nomenclature [76], with minor modifications. For example, for the QTL denoted as qSW-1-1 ZY , q indicates QTL, SW stands for the trait (seed width), -1 show the chromosome on which the QTL detected, -1 also indicates the order of QTL identified on the chromosome for each trait, and ZY represents the ZY-RIL population in which QTL was detected.

Mining of Candidate Genes for Major QTLs
QTLs identified in two or more than two environments with R 2 > 10% were considered as significant and stable QTLs [77]. By utilizing the online resource databases of Phytozome (http://phytozome. jgi.doe.gov) and SoyBase (http://www.soybase.org), we downloaded all the genomic data within the physical interval position of the major "QTL hotspots", and candidate genes were predicted based on the gene annotations (http://www.soybase.org and https://phytozome.jgi.doe.gov), as well as previously published literature. Gene ontology (GO) information was derived from SoyBase through online resources: GeneMania (http://genemania.org/); Gramene (http://archive.gramene.org/db/ontology); the Kyoto Encyclopedia of Genes and Genomes website (KEGG, www.kegg.jp); and the National Centre for Biotechnology Information (NCBI: https://www.ncbi.nlm.nih.gov). These were used to screen the predicted candidate genes further. Gene ontology (GO) enrichment analysis was conducted for all the genes within the four major "QTL hotspots", viz., "QTL Hotspot A", "QTL Hotspot B", "QTL Hotspot C", and "QTL Hotspot D", using agriGO V2.0 (http://systemsbiology.cau.edu.cn/agri-GOv2/) [78]. The freely available RNA-Seq dataset at the SoyBase website was obtained to analyze the expression of predicted candidate genes in different soybean tissues, as well as the development stages. A heat map for the visualization of fold-change in the expression patterns of these predicted candidate genes was constructed by using TBtools_JRE1.6 software [79].

Conclusions
In conclusion, the present study is a detailed investigation for elucidating the genetic architecture of seed sizes and shapes in soybean. In aggregate, 88 and 48 QTLs were detected through CIM and MCIM, respectively, including 15 common QTLs, with two major (R 2 > 10%) and novel QTLs, viz., qSW-1-1 ZY and qSLT-20-1 K3N . Besides, 51 and 27 QTLs, identified through CIM and MCIM, respectively, were reported for the first time. All identified QTLs were clustered into four major "QTL cluster/hotspots" and represent the major and stable genomic regions governing the inheritance of soybean seed sizes and shapes. Hence, these "QTL hotspot" regions could be of significant consideration for future soybean breeding. Our study predicted 23 genes as the possible candidates, regulating seed sizes and shapes within the genomic region of four "QTL hotspots"; however, they need further functional validation to clarify their actual roles in seed development. Moreover, our results showed that 15 QTLs exhibited significant AE effects, and 16 pairs of QTLs possessed an epistatic effect. However, except for three QTLs, viz., qSL-13-3 ZY , qSL-13-4 ZY, and qSW-13-4 ZY , all the remaining epistatic QTLs showed no main effects. Hence, the hotspot regions and novel significant stable QTLs identified in the present study will be the main focus of soybean breeders for fine mapping, gene cloning, and the MAB of soybean varieties with improved seed quality and yield.

Conflicts of Interest:
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.