Development of a core set of single nucleotide polymorphism markers for genetic diversity analysis and cultivar fingerprinting in cowpea

Cowpea is an important legume crop worldwide. However, the diversity and relationship of current accessions remain unclear due to a lack of robust genetic information. A set of 40,089 single nucleotide polymorphisms (SNPs) adapted to the Kompetitive Allele Specific PCR (KASP) SNP genotyping platform were developed from the existing Illumina Cowpea iSelect Consortium Array for cowpea germplasm genetic diversity assessment and variety identification. Using the genotypic data of 299 cowpea accessions obtained from this SNP assay, a precore pool of 434 SNPs and 50 informative core SNPs was selected and validated for use in future genetic diversity analyses of cowpea germplasm. By further genotyping 75 commercial cultivars using the set of 50 core SNPs, pairwise genotype alignment and dendrogram analysis both showed that each cultivar could be uniquely identified by this core set of markers. The KASP markers developed in this study provide a flexible marker resource for both basic and applied research of cowpea. The precore SNP pool and the core SNP set also provide valuable tools for genetic variation assessment and variety rights protection for cowpea breeders.

traditional distinctness, uniformity, and stability (DUS) testing faces a big challenge to distinguish them due to the limited number of traits evaluated and inconsistency of these traits under different environmental conditions (Jamali et al., 2019). Thus, to facilitate the rapid release of cowpea varieties and at the same time secure breeders' intellectual rights, new technologies are urgently needed to assist cowpea breeding and intellectual right protection.
With the advancement of crop genomic resources, molecular markers have been widely used in genetic variation assessment, molecular breeding, DNA fingerprinting, and cultivar rights protection (Gao et al., 2016;Su et al., 2018;Varshney et al., 2008;Yang et al., 2019;Zhang et al., 2012). In cowpea, various molecular markers, such as amplified fragment length polymorphism (AFLP) (Boukar et al., 2004;Coulibaly et al., 2002), simple sequence repeat (SSR) or microsatellite , and single nucleotide polymorphism (SNP) markers (Muchero et al., 2009;Xu et al., 2011) However, this 51k iSelect array has not been widely accepted by world cowpea breeders. Its relatively high cost and the co-segregation of most markers on a specific genetic map make it difficult for breeders to obtain useful SNPs. For most cowpea researchers and breeders, an ideal SNP genotyping platform should be highly adaptable and cost-effective. To date, multiple high-throughput SNP genotyping platforms have been developed, including Illumina GoldenGate (Fan et al., 2003), Infinium (Steemers & Gunderson, 2010), TaqMan technology (Livak et al., 1995), and Kompetitive Allele Specific PCR (KASP) platforms (KBiosciences, www.kbioscience.co.uk). Among them, the Illumina GoldenGate and Infinium platforms require the design and ordering of special beadchips (such as Cowpea iSelect Consortium Arrays); the price for a single chip is expensive, and the SNPs anchored on the chip are fixed.
TaqMan and KASP technologies are both flexible and compatible SNP genotyping platforms based on PCR assays. Unlike TaqMan, KASP genotyping requires only two allele-specific PCR primers and one other locus-specific PCR primer, which should be more cost-effective.
To promote the application of KASP-SNP markers in cowpea breeding, in the present study, we converted all 51,128 SNPs from the Illumina Cowpea iSelect Consortium Array into KASP-SNPs and then constructed a set of core SNPs under the KASP genotyping platform. The core SNP set in this study will provide a useful resource for genetic variation assessment of cowpea germplasm resources, DNA fingerprinting of cowpea varieties, and rapid identification of seed quality.

| Plant materials
Seventy-five cowpea cultivars were used for core SNP screening and DNA fingerprinting. These cultivars were developed by commercial breeding companies or public vegetable research institutions in China.

| KASP primer development
The sequence information of SNPs in the Cowpea iSelect Consortium Assay was obtained from Muňoz-Amatriaín et al. (2017). Based on the SNP sequences, two allele-specific forward primers and one common reverse primer were designed for each SNP by an independent primer design program following the LGC proprietary Kraken™ software system (https://www.lgcgroup.com/cn/our-science/genomics-solutions/ genotyping/kasp-genotyping-chemistry/reagents/kbd/#. WubvXf4h1hg).

| Core SNP identification
The genotypic data of 299 cowpea accessions (84 grain-type cowpea and 215 vegetable-type cowpea) identified by the Cowpea iSelect Consortium Assay  were retrieved to screen core SNPs following three steps. First, SNPs that were difficult to convert into KASP markers or that were not anchored onto the consensus map  or the "ZZ" linkage map  were filtered out. Second, a precore SNP pool was picked with a computer script based on the following criteria: (1) remove SNPs with the same genotypes in the 299 accessions and keep only one representative SNP; (2) randomly sample a panel of 150 SNPs 10,000,000 times from the remaining markers and ensure that these SNPs were evenly distributed on the cowpea genome; (3) calculate the genetic distance of the 299 accessions using 150 SNPs in each panel if the genetic distance between two accessions appeared to be 0 for all SNP panels, which means 150 SNPs did not have enough power to distinguish all the accessions; and (4) repeat Steps 2 and 3, increasing the SNPs from 151 by one SNP each time, and calculate the genetic distance of the 299 accessions until no genetic distances of 0 appear between each pair of accessions. Third, the best discrimination power for all SNPs in the precore pool was determined using a Perl script described in Du et al. (2019). SNPs with a high discriminatory power of up to 97% across all varieties were considered informative core SNP sets. Finally, all the informative SNPs based on the consensus map and the "ZZ" linkage map were combined into a core SNP set.

| Population structure validation
To validate the resolution of the core SNP markers, the genotype data for the precore SNP pools and the informative core SNP set of 299 cowpea accessions were retrieved again for genetic diversity analysis. Population structure was analyzed using Structure 2.3.4 software under the admixture model with a burn-in period of 1000 followed by 1000 Markov chain Monte Carlo replications. Five independent runs each were performed with the number of clusters (K) varying from 1 to 10. The optimal K for subgrouping was calculated using STRUCTURE HARVESTER (Earl & vonHoldt, 2012). In addition, an unrooted phylogenetic tree was constructed using Tassel 5.0 based on the neighbor-joining method.

| DNA extraction and KASP genotyping
Genomic DNA was extracted from leaves of 2-week-old seedlings using the cetyl trimethyl ammonium bromide (CTAB) method (Doyle & Doyle, 1987). DNA quality and concentration were mea-

| DNA fingerprinting
Seventy-five varieties were genotyped using the core SNP markers.
The genotype of each variety was proposed as its DNA fingerprint.
Pairwise genotype alignment by a local script was used to distinguish one variety from others, and neighbor-joining tree analysis was conducted to determine their genetic distances using Tassel 5.0.

| KASP primer design
Among the 51,128 SNPs from the Cowpea iSelect Consortium Assay, 50,577 were converted into KASP markers, from which, 40,089 highquality KASP markers were retained after filtering the primers with GC content <30% or >60%. Then, 598 randomly selected KASP primers were synthesized and tested in different populations, including RIL, F 2 , and natural populations. Over 96% of these KASP markers exhibited clear amplification products (Figure 1).

| Core-SNP identification
The 299 cowpea accessions were previously genotyped with 49,194 SNPs using the 51,128 SNP assay . These data were filtered using a missing call rate ≤20%, heterozygosity call rate ≤20% and MAF (minor allele frequency) ≥ 0.01, and 30,211 high-quality SNPs remained that were used to develop the Core-SNP set. Among them, 23,506 SNPs were successfully converted into KASP markers; 20,161 of them were anchored to the consensus map (Muňoz-Amatriaín et al., 2017) and were used for core SNP identification.
Based on the calculated genetic distance (materials and methods), a minimal set of 188 SNPs, which could distinguish 86.7% of the 299 accessions, was selected as a precore SNP pool. Through further analysis of the full genotypes of the accessions that could not be distinguished, we found some landraces, although with different names, with identical genotypes. Alternatively, some accessions only varied at one SNP locus, and one of them had a missing genotype at this locus. After removing these redundant accessions, only one representative accession was retained. Finally, 24 informative SNPs with a high discriminatory power of up to 100% were proposed as a core SNP set (Figure 2). Following a similar strategy, using the ZZ linkage map as a reference, 6296 available SNPs were selected, and 250 SNPs were used to construct a precore SNP pool. Then, 26 informative SNPs were identified as a second core SNP set ( Figure 2). To improve the resolution of the core SNPs, two core SNP sets were combined, and the 50 informative SNPs were finally used as the cowpea core SNP set. These 50 SNPs were distributed on 11 linkage groups, and the SNP number varied from 3 to 7 on each chromosome (Table 1 and Figure 3). In addition, the two precore SNP pools were also combined, and a new precore pool of 434 SNPs was developed after removing four overlapping SNPs between the two pools.

| Genetic diversity validation using different SNP panels
To validate whether the core SNP set had equal power for genetic diversity analysis with the whole Cowpea iSelect Consortium Assay, population structure analysis was conducted in the diversity panel of 299 accessions using 434 precore SNPs and 50 informative core SNPs. As shown in Figure 4, both delta K peaks appeared at K = 2 when using 50 SNPs or 435 SNPs, which was consistent with the results in Xu et al. (2017) with 30,211 SNPs, suggesting that 50-SNPs or 434-SNPs sets have equal power to divide the 299 accessions into two subpopulations. The neighbor-joining tree analysis also displayed similar results using different SNP sets (Figure 4). These results suggest that the 50 core SNPs can fully represent the whole Cowpea iSelect Consortium Assay in cowpea genetic diversity assessment.

| Discriminatory power of the core SNP set in commercial cultivars
To assess the discriminatory power of the core SNPs, 75 commercial cultivars collected from the seed market were genotyped using the first set of 26 core SNPs. Forty-six cultivars (63%) were distinguished by these SNP genotypes. After integrating the genotyping information from the second set of 24 core SNPs, each cultivar could be distinguished from the other 74 cultivars by pairwise genotype alignment.
The neighbor-joining tree analysis clearly distinguished any two of the 75 cultivars by their genetic distance ( Figure 5). We also found that the 75 cultivars were distinguished completely when the 43rd SNPs were added, indicating that the discriminatory power of the core SNPs increased along with the number of SNPs used. This phenomenon was also found in pumpkin, where the 96, 48, 24, and 12 core SNP sets were able to identify 85.2%, 63.2%, 24.2%, and 4.9% of the 223 pumpkin accessions, respectively (Nguyen et al., 2020), again indicating a positive correlation between the germplasm discrimination power and the core SNP number. For each cultivar, the genotypes of the 50 core SNPs can be used for their unique DNA fingerprint or applied for rapid seed purity tests. developed in rice, Brassica rapa, pepper and pumpkin (Du et al., 2019;Li et al., 2019;Nguyen et al., 2020;Yang et al., 2019). With high representation and resolution, core SNP sets have been widely used in genetic diversity analysis, marker-assisted selection, and seed intellectual rights protection. In the current study, we developed a set of 434 precore SNP pools and a set of 50 core SNPs based on KASP technology and successfully used them for cowpea genetic diversity assessment and DNA fingerprinting.
Compared with the reported core SNP set from resequencing, such as genotyping by sequencing (GBS) in other crops (Du et al., 2019;Nguyen et al., 2020), the core SNP set in our study was directly derived from the existing Cowpea iSelect Consortium Assay. A total of 40,089 high-quality primers, with a 78.4% conversion rate, were obtained from the 51,128 SNPs, which is close to the 81.5% conversion rate in maize (Semagn et al., 2014) DUS testing is a standard system for plant variety protection (Jamali et al., 2019). However, as an increasing number of commercial cultivars are released, the phenotypic differences among different accessions become narrower. This makes it difficult to discriminate cultivars with similar genetic backgrounds with DUS testing. Therefore, a DNA-based system with molecular markers has been considered an alternative to improve the efficiency and accuracy of DUS testing (Jamali et al., 2019). In cowpea, SSR markers have been widely used to construct DNA fingerprints for cultivar protection (Danso et al., 2018;Lu et al., 2010;Mendes et al., 2015;Ragul et al., 2018).
However, SSR marker resolution depends on the discrimination of electrophoretic bands, which may affect the genotyping accuracy. The relatively low throughput of SSR methods also limits their application to construct DNA fingerprints for a single cultivar or a limited number of cultivars at a single time. Moreover, SNP genotyping based on the KASP method outperforms the SSR approaches in both throughput and precision. In this study, based on the 50 core SNPs, DNA fingerprinting for 75 commercial cultivars was performed, and each cultivar could be distinguished from the other 74 cultivars, indicating that these 50 core SNPs could be used as a standard marker set for cowpea cultivar protection and registration, and they can also be used in rapid seed purity tests.
In conclusion, 40,089 KASP markers were developed, and a core set of 50 highly informative, representative, and evenly distributed SNP markers was identified in cowpea, providing a large flexible marker resource for both basic and applied research, such as genetic variation mining, germplasm management, parental material selection, variety identification, and seed purity tests. and Ye Tao (Biozeron Biotech, Shanghai) for technical assistance. We also thank Dr. Yiqun Weng from University of Wisconsin-Madison for editing the manuscript.

CONFLICT OF INTEREST
None declared.

ETHICAL APPROVAL
This study does not require any ethical approval.

DATA AVAILABILITY STATEMENT
The genotypic data on the mini core collection that support the findings of this study have been provided in a publication by Xu et al. (2017).