Optimization of Genotyping- by-Sequencing (GBS) for Germplasm Fingerprinting and Trait Mapping in Faba Bean

Faba bean ( Vicia faba L.) is an important pulse crop with a wide range of agroecological adaptations. The development of genomic tools and a comprehensive catalog of extant genetic diversity are crucial for developing improved faba bean cultivars. The lack of a cost-effective genotyping platform limits the characterization of large germplasm collections, understanding of genetic diversity across populations, and implementing breeder's tools like genomic selection. Genotyping- by-sequencing (GBS) offers high-resolution genotyping for both model and crop plant species, even without a reference genome sequence. The genome fragments targeted by GBS depend substantially on the restriction enzyme (RE) used for the complexity reduction step. Species with complex genomic architecture require optimization of GBS with proper RE to realize the full potential of GBS. Here, we evaluated various REs in the GBS method and identified that the combination of Ape KI/ Mse I proved to be the most appropriate for faba bean based on the best library quality, a high number of genomic loci spread across chromosomes, and high enrichment loci associated with the gene space. With the new optimized protocol, we constructed a genetic map using a recombinant inbred line (RIL) population and identified a QTL for seed hilum color on Chromosome 1. In addition, we also genotyped a diversity panel and performed a genome-wide association studies (GWAS) for important agronomic traits, including plant height (PH), flowering time (FT), and number of pods per plant (PPP). We identified six SNP markers significantly associated with these traits and listed potential candidate genes. The optimized faba bean-specific GBS procedure will facilitate access to the untapped ge-netic diversity for genetic research and breeding and may facilitate functional genomics.

systems (Angus et al. 2015) offers unprecedented opportunities to build a sustainable farming system for food and nutrition with fewer environmental footprints.Maximization of genetic diversity, either by capturing natural diversity or induced by mutagenesis, is a key to sustainable faba bean breeding.Currently, the wild progenitor of the faba bean is still unknown, but a wide range of diversity was observed within the cultivated gene pool (Mulugeta et al. 2021).International genebanks hold thousands of faba bean genetic resources, particularly the genebanks of the International Center for Agricultural Research in the Dry Areas (ICARDA), Lebanon, and of IPK Gatersleben, Germany, which hold a total of 9654 and 1860 individual accessions, respectively.Characterizing the extant diversity and corresponding phenotypes would provide an impactful resource for faba bean improvement.However, the lack of a cost-effective genotyping method in faba bean has hampered so far the systematic exploration of genetic diversity from large germplasm collections, high-resolution trait mapping, and genomic selection or prediction.
The faba bean has a very large genome (~13 Gb) with six chromosome pairs (2n = 12).In the past, due to a lack of a reference genome sequence, diverse transcriptome datasets were generated to develop genotyping arrays such as "Vfaba_v2" containing 24,929 single nucleotide polymorphism (SNP) markers derived from 15,846 gene loci (Khazaei et al. 2021).This provided an important tool for genetic mapping studies.For the evaluation of broader yet uncharacterized diversity, however, the use of such a SNP array derived from limited genetic diversity would introduce strong ascertainment biases to the analysis.Recently, a high-quality chromosome-level reference genome was generated for faba bean (Jayakodi et al. 2023), and this information was used to design 90,000 oligonucleotide probes to perform single primer enrichment technology (SPET) assay-based genotyping.In addition to known polymorphisms used for assay design, SPET also allows the discovery of novel variation; hence, it is relatively better suited for exploring unknown diversity than SNP-Chip assays.Nevertheless, these probes were generated from a single reference genome, which could also lead to a biased estimation of diversity due to single reference biases (Jayakodi et al. 2021).Particularly, the genotyping cost per sample limits its application in the genotyping of a large number of accessions, such as genebank collections, and is less affordable to smaller labs working with a mapping population or a diversity panel comprising a few hundred genotypes.Therefore, a cost-effective and unbiased genotyping method is highly needed to accelerate faba bean research and breeding.
Genotyping-by-sequencing (GBS) enables the simultaneous discovery and genotyping of a large number of SNPs (Elshire et al. 2011).In crops, GBS was successfully applied to genotyping large genebank collections (Milner et al. 2019), genomewide association studies (GWAS) (Arruda et al. 2016), linkage map construction (Hussain et al. 2017), quantitative trait locus (QTL) mapping (Mathivathana et al. 2019), and genomic selection (GS) (Battenfield et al. 2016).The GBS method uses restriction enzymes (REs) to target less complex genomic loci for developing genome-wide polymorphic markers, and optimization of proper REs concerning species or genomic architecture is imperative.Improper selection of REs leads to biases toward sequencing exonic or intergenic regions with high proportions (López et al. 2023;Nguyen et al. 2018).In general, methylationsensitive enzymes (also known as rare cutters that recognize genomic loci occurring only rarely in the genome) such as ApeKI or a combination of a frequent cutter that recognizes commonly occurring sites in the genome and methylation-sensitive enzymes like MspI and PstI are often used in most cereal crops (Milner et al. 2019;Wang et al. 2020) and forage legumes (Julier et al. 2021).These enzymes tend to target genic or low-copy regions and thus reduce the representation of repetitive regions in GBS sequencing.Nevertheless, the complex and unique genome characteristics require the identification of appropriate REs to realize the full potential of GBS.Strikingly, faba bean has a large intergenic space (~330 kb) expanded by a significant proportion of satellite repeats and large-size retroelements (up to 35 kb) (Jayakodi et al. 2023).Selection of inappropriate REs might result in a library with incomplete digestions, enrichment for repeats or GC-rich regions, and biased marker coverage along the genome.
In this study, we used single RE (ApeKI and Msll) and a combination of REs (ApeKI/MseI, Pstl/ApeKI, Pstl/Msel, and PstI/MspI) digestion to optimize the GBS procedure for faba bean.We identified the most optimal RE combination based on the sequencing library quality, highest number of loci, genomewide coverage, and enrichment for gene space.In addition, we applied our best RE combination (ApeKI/MseI) to genotype an advanced biparental mapping population developed from ILB 938/2 × Mélodie/2 and a diversity panel comprising 217 accessions to demonstrate the application of genetic diversity analysis and gene mapping.

| DNA Isolation, GBS Library Construction, and Sequencing
The faba bean germplasm used in this study includes an advanced mapping population along with a diversity set.The mapping population consists of 177 recombinant inbred lines (RILs) developed from the biparental cross of the homozygous genotypes ILB 938/2 × Mélodie/2 (Table S1) (Khazaei et al. 2014) at F8:9 generation.In addition, we have selected a custom subset of 217 accessions covering broad geographical regions of 45 countries from the ICARDA genebank (Table S2).This panel includes mainly domesticated spring faba bean genotypes.Individual faba bean plants were grown outside in the soil using a plastic foil tunnel or pots containing compost soil in a greenhouse with controlled shade and light conditions (Zimmermann et al. 2006) for 2-3 weeks.Three leaf discs (1 cm in diameter) were collected, pooled, and processed for DNA isolation as described earlier (Milner et al. 2019).GBS libraries were constructed following essentially a previously described procedure (Wendler et al. 2014).Genomic DNA (200 ng) was cleaved using ApeKI, MslI, ApeKI + MseI, PstI + MseI, PstI + ApeKI, and PstI + MspI (Table S3).After restriction digestion, samples were used for adapter ligation without purification as described (Wendler et al. 2014).Adapter mixes (P5 and P7, Table S3) compatible with the ends derived from the restriction digestions were used.Adapter ligation to the DNA fragments and SPRI (solid phase reversible immobilization) purification using MagNa beads (Thermo Scientific Inc., Waltham, MA;Rohland and Reich 2012) were performed as published (Wendler et al. 2014).
DNA was eluted in 20 μL EB (10 mM Tris HCl pH 8.0), and the DNA/bead suspension ("with-bead" SPRI method; Fisher et al. 2011) was used for the adapter fill-in (Wendler et al. 2014).For the subsequent DNA clean-up, 1.8 volumes of PEG buffer (18% PEG 8000, 1 M NaCl, 10 mM Tris HCl pH 8.0, 1 mM EDTA, and 0.05% Tween 20) were added.The purification of the DNA using a magnet plate involved standard procedures.The DNA was eluted in 20 μL EB, and the remaining MagNa beads were discarded.The addition of unique dual indexes by indexing PCR, SPRI purification, quantification of the individual libraries, and equimolar pooling were conducted as described by Wendler et al. (2014).Products of adapter ligation, indexing PCR, and primer positions are shown in Figure S1.Pooled DNA was size-fractionated (targeted size range: 400-600 bp) using the preparative Blue Pippin electrophoresis system and 2% agarose gel cassettes (Marker V2) according to standard manufacturer's protocols (Sage Science, Beverly, MA, United States).The library was quantified by qPCR, as described by Mascher et al. (2013).For the initial comparison of GBS libraries derived from different restriction digestions, an equimolar library pool was prepared and low-pass sequenced by the Illumina MiSeq device (Illumina Inc., San Diego, CA, United States).Final deep sequencing of GBS libraries prepared from ApeKI and MseI fragments was performed using the Illumina NovaSeq6000 device (S4 reagent kit: 200 cycles, v1.5 chemistry, standard Illumina sequencing primer; paired-end [PE]: 151 cycles [Read 1], 8 cycles [Index Read 1], 8 cycles [Index Read 2], and 71 cycles [Read 2]) according to the manufacturer's instructions (Illumina Inc., San Diego, CA, United States).

| QTL Mapping of Seed Hilum Color
The seed hilum color, either black or white, scored for the 177 RILs.The QTL for hilum color was mapped using our linkage map and the "scanone" function in R/qtl v1.66 (Arends et al. 2010).The binary model and the marker regression parameters were used.In both cases, p values and significance thresholds were generated using a permutation test with 1000 replicates.

| GWAS Analysis
Our custom panel of 217 genotypes was used for GWAS analysis.The traits of plant height (PH), flowering time (FT), and number of pods per plant (PPP) were recorded in Terbol, Lebanon, for 2 years (2021 and 2022).Each trial was in alpha lattice design with two replicates.Phenotype scores were recorded in three plants of each plot and averaged for each trait.The best linear unbiased estimates (BLUEs) were computed by considering all accessions as fixed terms using Genstat (v2021) software with an incomplete spatial model.The missing genotype calls were imputed using Beagle v5 (Browning et al. 2021).PCA and the analysis of the kinship matrix were performed using the GAPIT v3.0 (Wang and Zhang 2021) function to determine any underlying population structure.GWAS analysis was done with BLINK (Huang et al. 2018) using the imputed SNP matrix.The candidate gene function annotation was identified according to the predicted proteincoding genes blasted against five databases of eggNOG5.0,NCBI NR, and SwissProt.

| Optimization of GBS for Faba Bean
We tested ApeKI and MslI for single digestion and ApeKI/MseI, PstI/MseI, PstI/ApeKI, and PstI/MspI REs for double digestion method in two genotypes, including FAB 6502 and FAB 6477.We selected these single and double enzyme combinations based on our pilot experiments with various REs (data not shown).First, we evaluated the GBS library construction and observed good library quality, such as a diverse library, no bias, no overrepresentation of repetitive DNA, and an evenly distributed smear of DNA fragments on agarose gel for ApeKI, Msll, and ApeKI/MseI digestions (Figure S2).Notably, we identified poor library quality, that is, bias and overrepresentation of specific identical sequences, indicated by prominent bands of DNA fragments on agarose gel less data for analysis for libraries that used PstI as one of the REs in double digestion (Figure 1 and Figure S2).Second, we aimed to obtain approximately 2-3 million (M) sequencing reads from each library (Table S4).Third, we mapped each library's reads to the faba bean reference genome "Hedin/2" and calculated the number of loci targeted by each library and depth per loci.Intriguingly, we found that the ApeKI/MseI library showed the highest number of loci with a mean read depth of 1.8 (Figure 1 and Figure S3), which could compromise the number of variants (SNPs and INDELs).Furthermore, the distribution of read depth along chromosomes exhibited uniform coverage from the ApeKI/MseI library (Figure 1 and Figure S3).We also identified the highest number of reads targeting gene space from the ApeKI/MseI library.In contrast, Pstl/ApeKI and PstI/MspI covered fewer genomic loci with a high mean depth of between 6-and 7-fold.Instead of targeting entire chromosomes, these libraries were enriched for fewer genomic loci, thus limiting the development of genome-wide SNP markers.Overall, these results showed that the ApeKI/MseI double digestion was the most appropriate for GBS library construction and sequencing to discover variants across the faba bean genome without potential biases in genome reduction.Then, we tested the GBS data yield with regard to SNP missingness.As expected, a lower number of reads in the samples led to higher missing data.However, we observed reduced missing data when the samples were sequenced over 6 M reads (Figure S4a,b), indicating an appropriate read yield required to avoid technical limitations in obtaining high-quality SNPs.Further, due to the high repeat nature of the faba bean genome, single-end GBS read alignment produced a limited number of high-quality alignments, which might lead to numerous false-positive variants.Therefore, we shifted to a new PE approach where Read 1 is 151 bp and Read 2 is 72 bp (151 × 71) (Figure S4c).With this strategy, we achieved better read mappability and numerous high-quality alignments (Figure S4d), and thereby, we expect high-quality variant discovery from faba bean germplasms.

| Genetic Mapping and QTL Analysis
A filtered set of 30,928 SNPs was generated from the GBS datasets from a ILB 938/2 × Mélodie/2 RIL population.The SNP dataset was then filtered for markers that were polymorphic in the two progenitors.Additionally, the genotype samples with a percentage of heterozygous counts higher than 5% were eliminated for the downstream analysis.After filtering, the resulting dataset contained 1742 markers across 101 individuals.After filtering and quality control, the linkage groups contained a total of 947 GBS markers across 101 individuals, genotyped at a percentage of 90.5%.Finally, we constructed six linkage groups containing 947 markers spanning 1395.2 cm, overlapping with six physical chromosomes of the faba bean (Figure 2a).The average distance between the markers is 1.5 cm, with a maximum distance of 12.6 cm (Table S5).The genetic size of the linkage groups ranged from 156.8 cm for Linkage Group 4 to 432.1 cm for Linkage Group 1. Overall, we observed good collinearity between the genetic map and the physical pseudomolecules of the faba bean genome assembly (Figure S5).Further, we performed QTL mapping for the economically important trait "hilum color."A significant QTL for hilum color was also mapped to Chromosome 1 (Figure 2b).Consistently, the QTL region overlapped with the previously identified candidate genes for seed hilum color (Jayakodi et al. 2023) (Table S6).

| Genetic Diversity Analysis
We also performed a diversity analysis of our custom diversity panel (Figure S6a).After GBS experiments, the reads were mapped to the genome sequence of the faba bean cv.Hedin/2 for variant calling.We identified 39,928 SNPs after removing SNPs containing a missing rate (> 10%) and heterozygosity (> 10%) (Figure 3a).The number of high-quality SNPs per chromosome ranged from 5045 (chr5) to 11,534 (chr1).With these markers, a PCA identified the distribution of genotypes across PCA edges (Figure S6b and Table S2).In addition, pairwise identity-bystate (IBS) comparison analysis revealed genetic similarity between 88% and 90% (Figure S6c), exhibiting a wide diversity in the panel correlating with the geographic origin.Furthermore, these results indicate that these GBS-based informative markers provide a robust tool to assess the genetic diversity of faba bean.

| GWAS With Genebank Material
Our dense GBS-derived SNP markers are well suited to detect genetic loci linked to agronomic characters using GWAS.To demonstrate an association scan with our GBS data, we first imputed the missing genotype calls.Then, we performed GWAS with three agronomic traits: "PH," "FT," and "number of PPP."We identified significant maker-trait associations for each trait (Figure 3).Further, the quantile-quantile (QQ) plots for the GWAS supported a strong association between the observed and expected distributions of the p values.We defined the QTL regions based on the LD (r 2 cutoff = 0.9) of the associated GWAS marker and listed the genes in those regions (Table S7).Like in numerous previous GBS studies, these results corroborate the utilization of our GBS method in genome-wide association studies in faba bean.

| Discussion
High-throughput genotyping platforms are required to advance plant genetic studies and gene cloning.The advent of next-generation sequencing (NGS) technologies led to the development of GBS, which generates a large number of SNP markers for germplasm characterization, genome diversity assessment, and various genetic analyses.RE is employed in GBS, and RE is sensitive to genomic composition; thus, species-specific optimization of the GBS procedure may be required in certain species.In principle, GBS is cheaper than medium-or high-coverage whole-genome shotgun sequencing.Heinrich et al. (2020) used MslI to obtain GBS data from 20 faba bean genotypes.However, in this study, we have identified an appropriate RE combination, ApeKI/MseI.We selected this combination based on the reference genome composition, which showed heavy DNA methylation and a substantial proportion of AT-rich DNA sequences (Jayakodi et al. 2023).Thus, we choose enzymes to recognize the unmethylated portion of the genome and can cut AT-rich regions.In addition, we tested various approaches to increase GBS read mappability and reduce the missingness.Our key recommendations are PE sequencing (151 × 71 cycles) with a data coverage of greater than 5-7-fold per sample.As of 2024, we calculated that the cost of genotyping a single faba bean sample was 20.69 EUR, including library construction and VAT (19%), which is twofold cheaper than previous genotyping platforms.The cost may also vary in the future based on new developments in library kits and sequencing platforms.
Further, we demonstrated the use of our GBS in QTL mapping and GWAS.The size of the genetic map generated in this study is comparable to other maps constructed using genotyping data from the same population, which range from 928 (Khazaei et al. 2014) to 1229.5 cm (Gela et al. 2022).The candidate genes controlling the hilum color in faba bean were previously identified as a cluster of polyphenol oxidase genes using GWAS and a diverse population (Jayakodi et al. 2023).Here, we remapped a major QTL for this trait in the same region of Chromosome 1 as the cluster of polyphenol oxidase genes.Similarly, the GWAS pinpointed two significant loci linked to the number of PPP traits on Chromosomes 1 and 2. Among them, the Chromosome 1 locus is proximal to the previously associated marker (AX-416763724) (Vfaba.Hedin2.R2.1g001468.1/Vfaba.Hedin2.R1.1g069400.1 rab GTPase-activating Protein 22), highlighting an important locus for marker-based improvement of yield-related traits in faba bean (Gutierrez et al. 2024).A previous study used a RIL population and reported several QTLs related to FT in faba bean (Aguilar-Benitez et al. 2021).In this study, we used a diversity panel and reported a new locus for FT in the faba bean.Intriguingly, we identified a candidate gene-encoding enzyme, peptidyl-prolyl cis-trans isomerase (Vfaba.Hedin2.R2.2g016227.1),which was found to be a key regulator of FT in Arabidopsis (Wang et al. 2010).Therefore, aside from our GBS method, these markers are valuable to use in marker-assisted breeding.

FIGURE 1 |
FIGURE 1 | Read depth of GBS reads covering genomic loci produced by the various restriction enzyme (RE) combinations.(a) Single (ApeKI and Msll) and double (ApeKI/MseI, Pstl/ApeKI, Pstl/Msel, and PstI/MspI) digestion for GBS optimization in faba bean were conducted in FAB 6502 accession.(b) Similar single and double digestion GBS were done in another accession, FAB 6477.The lines represent the number of loci in the genome region covered after the corresponding GBS library.

FIGURE 2 |
FIGURE 2 | Genetic map and QTL analysis.(a) Linkage map from the ILB 938/2 × Mélodie/2 biparental population showing the position of SNPs across the six linkage groups.(b) LOD scores across the chromosomes for the seed hilum color trait.The red line indicates the LOD significance threshold calculated based on 1000 permutations at a 5% significance level.The red line indicates the LOD significance threshold calculated based on 1000 permutations at a 5% significance level.

FIGURE 3 |
FIGURE 3 | Application of GBS markers in faba bean genetics.(a) Distribution of filtered GBS marker (n = 39,928) on the chromosomes of the faba bean.The heatmap indicates the marker density across chromosomes.(b) GWAS analysis with GBS makers.The top, middle, and bottom panels show the GWAS results for plant height (PH), flowering time (FT), and number of pods per plant (PPP), respectively.The corresponding QQ plots show the theoretical distribution of p values (x-axis) versus the observed distribution within the samples (y-axis).
Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/leg3.254 by Duodecim Medical Publications Ltd, Wiley Online Library on [14/08/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License