NGS-Based Multi-Allelic InDel Genotyping and Fingerprinting Facilitate Genetic Discrimination in Grapevine ( Vitis vinifera L.)

: Molecular markers play a crucial role in marker-assisted breeding and varietal identification. However, the application of insertion/deletion markers (InDels) in grapevines has been limited by the low throughput and separability of gel electrophoresis. To developed effective InDel markers for grapevines, this study reports a novel, effective and high-throughput pipeline for InDel marker development and identification. After rigorous filtering, 11 polymorphic multi-allelic InDel markers were selected. These markers were then used to perform genetic identification of 123 elite grape cultivars using agarose gel electrophoresis and next-generation sequencing (NGS). The polymorphism rate of the InDel markers identified by gels was 37.92%, while the NGS-based results demonstrated a higher polymorphism rate of 61.12%. Finally, the NGS-based fingerprints successfully distinguished 122 grape varieties (99.19%), surpassing the gels, which could distinguish 116 grape varieties (94.31%). Specifically, we constructed phylogenetic trees based on the genotyping results from both gels and NGS. The population structure revealed by the NGS-based markers displayed three primary clusters, consisting of the patterns of the evolutionary divergence and geographical origin of the grapevines. Our work provides an efficient workflow for multi-allelic InDel marker development and practical tools for the genetic discrimination of grape cultivars


Introduction
The grapevine (V.vinifera L.) is one of the earliest domesticated fruit plants, characterized by the absence of genetic barriers within the genus [1].Over the past thousands of years, grapevine introduction, breeding and intensive trade have blurred genetic relationships [2].Moreover, the genetic diversity of grapes gradually decreases while domestication progresses [3][4][5].Currently, synonyms and homonyms are prevalent in the grape market, which has negative effects on breeding activities and breeders' rights [6].Therefore, economical, efficient and accurate variety identification methods are urgently needed to improve breeding efficiency and protect the rights of grape breeders.
DNA molecular markers are crucial for grape varietal identification [7].Among them, simple sequence repeat (SSR) and amplified fragment length polymorphism (AFLP) are widely used in grapevine identification.The use of 20 SSR markers helped identify three genetic populations among 1378 varieties [8].The genetic variability in 'Sangiovese', 'Sanforte' and 'Montepulciano' grapes was analyzed using multiple molecular markers such as SSR and AFLP [9].However, the application of SSR markers is hindered by the low throughput and intensive data processing [10].
With the development of sequencing technology, a large number of single nucleotide polymorphism (SNP) and InDel markers were identified.SNP and InDel have become the most promising markers in genetic research and molecular marker-assisted breeding due to their wide distribution in genomes and suitability for high-throughput genotyping [11,12].Emanuelli et al. [13] employed SNP markers to verify European grapevine cultivars, while Wang et al. [14] used SNP markers to distinguish major grape cultivars in China.Generally, InDels are defined as short insertion or deletion of up to 50 nucleotides at a single locus.Compared to SNPs, InDel markers can be easily detected using gel electrophoresis.In recent years, InDel markers have been applied as fingerprints in various crops, such as rice [15], maize [16], peach [17], apple [18], tomato [19] and cucumber [20].However, as of now, few InDel markers have been developed for grape-variety fingerprinting.
Multi-allelic InDels refer to variations caused by multiple different sizes of insertion or deletion at one allele within a population.Multi-allelic InDels are common in plants [21].The frequent selfing and hybridization in plants increase the diversity of alleles, and the relatively high rate of genetic mutations promotes the formation of multi-allelic InDels [22].Gel electrophoresis has a relatively low resolution and may not be sensitive enough to separate different fragments of multi-allelic InDels [23].The NGS allows the accurate identification of multi-allelic InDels and facilitates data processing.However, the application of NGS is limited by the high costs and lack of professional bioinformatics analysis tools.
This study established a practical workflow for multi-allelic InDel selection and identification.A total of 11 polymorphic InDel primers were selected from a variant database constructed using the resequencing data of 499 grapevine lines.A total of 123 core germplasms were collected and genotyped using gel electrophoresis and NGS.The polymorphism rates and discriminability of different genotyping strategies highlight the advantages of NGS-based methods.The fingerprints constructed by multi-allelic InDels in this study provide valuable tools for the genetic discrimination of grapes.

Materials
Preliminary screening of InDel markers was conducted using 499 whole-genome resequencing data from the public databases (Table S1).In August 2022, 123 young grapevine leaf samples were collected at the Institute of Forestry and Pomology, Tianjin Academy of Agricultural Sciences (Table S2).These samples exhibit a diverse genetic background and a wide range of phenotypic variations.Healthy and young leaves were collected under cool and dry conditions in the early morning.The collected leaf samples were homogenized with a 4 mm steel ball in a Retsch MM 400 Mixer Mill after chilling in liquid nitrogen.The total genomic DNA of all samples was extracted using the Plant Genomic Extraction Kit (Tiangen Biotech, Beijing, China).Subsequently, DNA concentration and purity were assessed using a NanoDrop™ 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).Qualified DNA samples were then stored at −20 • C.

Screening of InDels Markers
Initially, grape resequencing data were downloaded to construct a variant database, followed by quality control of the raw data.Fastp software (v.0.23.4) was used for data quality assessment and preprocessing of the sequencing data [24].Subsequently, the pre-processed sequencing data were aligned to the reference genome using the Burrows-Wheeler aligner (BWA, v.0.7.17) to generate bam files [25].SAMtools (v.1.14)was utilized for sorting, indexing, filtering and statistical analysis of these files [26].Following this, the Genome Analysis Toolkit (GATK, v.4.0.10.0) was employed to detect InDels, and programming was utilized to generate a final Variant Call Format (VCF) file, containing information about variations such as InDel locations, types, quality and their distribution across different samples [27].
High-quality polymorphic InDel sites were obtained by further filtering using the following criteria: (i) InDels are not on repeat regions in the genome, (ii) multi-allelic InDels, and (iii) the size of an InDel is more than 20 bp.Additionally, the length of the PCR product of the InDel was required to be within the range of 150-300 bp to enable separation by agarose gel electrophoresis.Genetic diversity parameters such as polymorphic information content (PIC), heterozygosity (Het), minor allele frequency (MAF) and gene diversity index (GDI) were calculated using the Python 3.12.4script (https://github.com/Lvmingjie/indel-seq,accessed on 16 May 2024).

InDel Primers Design and Selection
Primer design was carried out using Primer3 software [28].The 100 bp upstream and downstream sequences of each InDel were extracted.The primer design parameters were set as follows: (1) GC content less than 60%; (2) primer size between 18 bp and 22 bp; (3) melting temperature (Tm) between 55 • C and 62 • C. Primer sequences that amplify unique loci were selected using e-pcr software (v.2.3.12)[29].Subsequently, the genotype distribution data were analyzed, and 24 primer pairs exhibiting an even genotype distribution were chosen for further experiments.

InDel Genotyping
A total of 24 pairs of synthesized primers were screened using PCR amplification and agarose gel electrophoresis (Table S3).Based on the gel results, the core markers were selected based on the following principles: multiple and clear bands, insertion or deletion sizes over 20 bp and tendency to be located on different chromosomes.Finally, eleven core primers were chosen for the subsequent amplification of DNA from 123 grape samples (Table S3).PCR amplification was conducted in a 25 µL reaction volume containing 2 µL DNA (50 ng/µL), 0.25 µL forward primer (10 µM), 0.25 µL reverse primer (10 µM), 12.5 µL 2x Taq Master Mix and 10 µL ddH 2 O.The amplification process was carried out using an Applied Biosystems Veriti Thermal Cycler with the following conditions: 98 • C for 5 min, 35 cycles at 95 • C for 20 s, 56 • C for 20 s, and 72 • C for 20 s, followed by a final extension at 72 • C for 10 min and a concluding step at 4 • C to halt the reaction.Subsequently, 5 µL of the PCR products were separated by 2% agarose gel electrophoresis, and band information was visualized under UV light.The bands on the agarose gel electrophoresis profiles were recorded sequentially from smallest to largest as "1" "2" "3", with the absence of a band recorded as "0".For example, a single band was noted as "1_1", and double bands were noted as "1_2".This information was used to conduct further genetic analysis.
Since gel electrophoresis was not sufficient to fully discriminate the genotyping results, sequencing was used for further genotyping in this study.Eleven adapter-equipped core primers and universal barcode primers were used for two rounds of amplification.The universal barcode primers and PCR system used in this experiment were based on the system described in the study by Liu et al. [30].Products from the second round of amplification, originating from distinct individual plants, were then mixed in equal amounts and subjected to 150 bp paired-end NGS.The sequences of each target site were decoded using a custom script.To avoid and exclude sequencing errors, we set filtering requirements: total reads per well 1000; each genotype ≥ 100; Top1 percentage of genotypes per well ≥ 30%; Top2/Top1 percentage of genotypes per well ≥ 30%.This confirms the presence of the genotype in the sample, thereby improving the accuracy of the sequencing results.

Screening of InDels Markers
To select a set of stable and polymorphic multi-allelic InDel markers, we developed a rigorous filtering pipeline (Figure 1A).A total of 126,493,127 InDels were identified using 499 grape resequencing data (Figure 1B).Subsequently, 80 high-quality and highly polymorphic markers were retained based on the following criteria (Figure 1C; Table S2): (1) the number of InDel alleles ≥ 4; (2) InDel lengths between 20 bp and 100 bp; (3) the size of each of the two types ≥ 20 cp; (4) no repeats within 100 bp upstream and downstream of the InDel site; (5) missing rate ≤ 10%; and (6) maximum genotype frequency ≤ 50%.Based on the genotype frequencies observed in 499 samples, 24 InDels were selected, with uniformly distributed genotype frequencies across 19 chromosomes.Subsequently, based on the PCR and agarose gel electrophoresis results of 24 InDels, a total of 11 primers, which yielded clear and multiple amplified bands, were selected as the core markers (Figure 1D).

Genotyping of 123 Grape Lines Using Agarose Gel Electrophoresis
To investigate the genetic discriminability of the 11 core markers.We collected 123 elite grape cultivars with diverse ecotypes and phenotypes in China.PCR and agarose gel electrophoresis were used to analyze the 11 core multi-allelic InDels in the grape cultivars, resulting in a total of 1353 distinct bands on agarose gel electrophoresis, indicating an average polymorphism rate of 37.92% (Figure S1; Table S4).Based on the genotyping results, GDI, Het, MAF and PIC values were calculated for the 11 InDels (Figure 2).The PIC values of the InDels ranged from 0.305 to 0.566, with a mean of 0.407.Most (82%) fell within the range of 0.3 to 0.5.The Het values ranged from 0.359 to 0.611, with a mean of 0.488.The average MAF was 0.306, ranging from 0.210 to 0.411.The average GDI was 0.4882, ranging from 0.359 to 0.611.These results indicate that the 11 candidate multiallelic InDels exhibited a high level of polymorphism, making them optimal markers for grape fingerprinting.

NGS-Based Genotyping of 123 Grape Lines
In addition to agarose gel electrophoresis, NGS was performed on all 123 grape samples for further analysis.The genotyping outcomes derived from sequencing can provide richer information on InDels (Table S5).NGS revealed a higher average polymorphism rate of 61.12%.Based on the genotyping results, GDI, Het, MAF and PIC values were calculated for the 11 InDels (Figure 2).The mean values for PIC, GDI and Het of the 11 markers were 0.860, 0.806 and 0.647, respectively, which were significantly higher than those revealed by the gels.

Discrimination and Fingerprints of 123 Grape Lines
The InDel fingerprints were constructed based on the gels and NGS.Based on the fingerprinting data, heatmaps were used to illustrate the unique genotypes of grapes through distinct colors (Figure 3A,B).Two sets of genotype data from 123 accessions were analyzed to assess the identification efficiency.The NGS-based fingerprints successfully discriminated 122 accessions, achieving an identification efficiency of 99.18% (Figure 3D).The only exception was observed for 'Maple Leaf Grapes' and 'NO.8', both of which are round leaf grapes.The results based on agarose gel electrophoresis only distinguished 116 grape varieties.These results proved that the NGS method is more efficient than agarose gel electrophoresis in the genetic discrimination of grape varieties (Figure 3C).

Genotyping of 123 Grape Lines Using Agarose Gel Electrophoresis
To investigate the genetic discriminability of the 11 core markers.We collected 123 elite grape cultivars with diverse ecotypes and phenotypes in China.PCR and agarose gel electrophoresis were used to analyze the 11 core multi-allelic InDels in the grape cultivars, resulting in a total of 1353 distinct bands on agarose gel electrophoresis, indicating an sults, GDI, Het, MAF and PIC values were calculated for the 11 InDels (Figure 2).The PIC values of the InDels ranged from 0.305 to 0.566, with a mean of 0.407.Most (82%) fell within the range of 0.3 to 0.5.The Het values ranged from 0.359 to 0.611, with a mean of 0.488.The average MAF was 0.306, ranging from 0.210 to 0.411.The average GDI was 0.4882, ranging from 0.359 to 0.611.These results indicate that the 11 candidate multiallelic InDels exhibited a high level of polymorphism, making them optimal markers for grape fingerprinting.

NGS-Based Genotyping of 123 Grape Lines
In addition to agarose gel electrophoresis, NGS was performed on all 123 grape samples for further analysis.The genotyping outcomes derived from sequencing can provide richer information on InDels (Table S5).NGS revealed a higher average polymorphism rate of 61.12%.Based on the genotyping results, GDI, Het, MAF and PIC values were calculated for the 11 InDels (Figure 2).The mean values for PIC, GDI and Het of the 11 markers were 0.860, 0.806 and 0.647, respectively, which were significantly higher than those revealed by the gels.

Discrimination and Fingerprints of 123 Grape Lines
The InDel fingerprints were constructed based on the gels and NGS.Based on the fingerprinting data, heatmaps were used to illustrate the unique genotypes of grapes through distinct colors (Figure 3A,B).Two sets of genotype data from 123 accessions were analyzed to assess the identification efficiency.The NGS-based fingerprints successfully discriminated 122 accessions, achieving an identification efficiency of 99.18% (Figure 3D).The only exception was observed for 'Maple Leaf Grapes' and 'NO.8′, both of which are round leaf grapes.The results based on agarose gel electrophoresis only distinguished 116 grape varieties.These results proved that the NGS method is more efficient than agarose gel electrophoresis in the genetic discrimination of grape varieties (Figure 3C).

Discussion
The grape, an important economic fruit crop with a storied history of cultivation and diverse varieties, is ranked as one of the world's most extensively cultivated fruit crops [32].The cultivation of V. vinifera extends across diverse geographical regions, facilitating frequent exchanges of cultivars and common interspecific hybridization, which compli-

Discussion
The grape, an important economic fruit crop with a storied history of cultivation and diverse varieties, is ranked as one of the world's most extensively cultivated fruit crops [33].The cultivation of V. vinifera extends across diverse geographical regions, facilitating frequent exchanges of cultivars and common interspecific hybridization, which complicates the taxonomy and identification of varieties [34].InDel molecular markers, known for their polymorphism and genetic stability, are crucial tools for crop genetic improvement, varietal identification and functional genomics research.Their application spans population genetic analysis, molecular breeding and medical diagnostics, demonstrating their potential to augment genetic improvement and germplasm innovation across various species [35][36][37][38].Despite the successful application of InDel markers in several crops and organisms, a dedicated InDel-based fingerprinting system for grapevines remains elusive.The establishment of an efficient and precise fingerprinting platform is imperative for the identification of grapevine varieties and population genetic analysis.
The rapid development of NGS has provided unprecedented opportunities for genomic research.Although the cost of NGS technology is continuously declining, the cost of high-throughput sequencing is still high, especially in large-scale sequencing projects.Multiplex sequencing is a practical approach to address the issue.By pooling the DNA or RNA of multiple samples, this approach significantly reduces experimental costs, reagent expenses and manpower resources.The Hi-TOM platform is an online platform for the sequencing of multiple samples and multiple target sites.The Hi-TOM strategy markedly conserves time and cost, proving to be more economical and expedient, especially when applied to a large number of samples or loci.To ensure the quality of multiplex sequencing, the demultiplexed method and InDel genotyping strategy are important.By fixing the bridge sequences and barcoding primers, the Hi-TOM tool has high reliability and sensitivity in tracking various mutations, especially complex chimeric mutations, frequently induced by genome editing.Consequently, multiplex sequencing based on the Hi-TOM sequencing platform was chosen for the identification of NGS-based indel marker in this study.To eliminate sequencing errors and enhance the identification accuracy of InDel markers, we developed a comprehensive filtration method for multi-allelic InDels based on several parameters, such as the read depth, heterozygosity, missing rate in population.
Multi-allelic InDels, characterized by the multiple insertions or deletions in single locus, exhibit higher polymorphism than traditional SNPs or InDels.However, the resolution of gel electrophoresis may not be able to identify all events of InDels.The high sensitivity of NGS enables the accurate detection of small insertion or deletion events, making it possible to identify more genotypes of an InDel.In this study, we developed a pipeline, including data quality control, genotype phasing and variety fingerprinting, for the analysis of NGS-based multi-allelic InDels.The NGS-based identification recalled all genotypes obtained from gel electrophoresis and demonstrated higher discrimination power.
Using 11 multi-allelic InDel markers, we performed genotyping of 123 grape cultivars, including Eurasian species, American species and European and American hybrids.The grapes exhibited a high degree of Het, with values ranging from 0.429 to 0.762, and a mean value of 0.647.These values were higher than those reported in previous studies [39].Due to the polymorphic nature of InDel markers, their PIC values are relatively high, ranging from 0.398 to 0.981.The average PIC value in this study was 0.86, which is higher than the PIC value of 0.38 for SNP markers [40] and 0.33 for broccoli [41], indicating a higher rate of polymorphism of 11 multi-allelic InDels.The average GDI value was 0.806, which is higher than the value of 0.271 observed in winter wheat [42].Among the InDel markers, 81.82% had GDI values ranging from 0.7 to 1, indicating the higher level of GDI of the core markers.The 11 NGS-based core markers used in this study could discriminate 122 grape varieties, resulting in an identification efficiency of 99.18%.The remaining unseparated 'Maple Leaf grapes' and 'No.8' were round-leafed grapes.This study suggests the superiority of NGS over agarose gel electrophoresis in genetically discriminating grape varieties.

Figure 1 .
Figure 1.Development of multi-allelic InDel markers in grapevine.(A) Pipeline for grapevine fingerprint database construction based on multi-allelic InDel markers.(B) Gene (black line plot in the outer track) and InDel (red line plot in the inner track) density derived from 499 grape resequencing data across 19 chromosomes.(C) Physical location of 80 high-quality multi-allelic InDels.A total of 24 highly polymorphic primers were labeled in red style.(D) Agarose gel electrophoresis identification of 24 polymorphic InDels selected from the 80 high-quality markers.A total of 11 markers in the red boxes were selected for further analysis.

Figure 1 .
Figure 1.Development of multi-allelic InDel markers in grapevine.(A) Pipeline for grapevine fingerprint database construction based on multi-allelic InDel markers.(B) Gene (black line plot in the outer track) and InDel (red line plot in the inner track) density derived from 499 grape resequencing data across 19 chromosomes.(C) Physical location of 80 high-quality multi-allelic InDels.A total of 24 highly polymorphic primers were labeled in red style.(D) Agarose gel electrophoresis identification of 24 polymorphic InDels selected from the 80 high-quality markers.A total of 11 markers in the red boxes were selected for further analysis.

Figure 3 .
Figure 3. Fingerprinting of 123 grape cultivars.(A) The fingerprints of 123 grape cultivars based on agarose gel electrophoresis results.(B) The fingerprints of 123 grape cultivars based on NGS.(C) The number of genotypes identified by agarose gel electrophoresis and NGS.The y-axis was the genotype number of markers revealed by NGS and gel.(D) The discernibility of different combinations of 11 multi-allelic InDel markers for 123 grape accessions.

Figure 3 .
Figure 3. Fingerprinting of 123 grape cultivars.(A) The fingerprints of 123 grape cultivars based on agarose gel electrophoresis results.(B) The fingerprints of 123 grape cultivars based on NGS.(C) The number of genotypes identified by agarose gel electrophoresis and NGS.The y-axis was the genotype number of markers revealed by NGS and gel.(D) The discernibility of different combinations of 11 multi-allelic InDel markers for 123 grape accessions.

Figure 4 .
Figure 4. Population structure analysis of 123 grape cultivars.(A) The phylogenetic tree of 123 grape cultivars based on agarose gel electrophoresis results.(B) PCA analysis based on agarose gel electrophoresis results.(C) The phylogenetic tree of 123 grape cultivars based on the NGS results.(D) PCA analysis based on the NGS results.All colors were marked according to the NGS results: Pop−1 (red), Pop−2 (green) and Pop−3 (purple).

Figure 4 .
Figure 4. Population structure analysis of 123 grape cultivars.(A) The phylogenetic tree of 123 grape cultivars based on agarose gel electrophoresis results.(B) PCA analysis based on agarose gel electrophoresis results.(C) The phylogenetic tree of 123 grape cultivars based on the NGS results.(D) PCA analysis based on the NGS results.All colors were marked according to the NGS results: Pop−1 (red), Pop−2 (green) and Pop−3 (purple).