Dataset on estimate of intra-specific genetic variability of African yam bean (Sphenostylis stenocarpa (Hochst. ex A. Rich.) Harms.) based on rbcL gene marker

African yam bean (Sphenostylis stenocarpa (Hochst. ex A. Rich.) Harms.) (Fabaceae) is a versatile crop of nutritional, nutraceutical, and pharmacological value widely grown for its edible seeds and underground tubers. Its high-quality protein, rich mineral elements, and low cholesterol make it a suitable source of food for age groups. However, the crop is still under-exploited and constrained by factors such as intra-specific incompatibility, low yields, indeterminate growth pattern and long gestation period, hard-to-cook (HTC) seeds, and the presence of antinutritional factors (ANFs). To efficiently utilize its genetic resources for improvement and utilization, it is necessary to understand the crop's sequence information and select promising accessions for molecular hybridization trials and conservation purposes. A total of 24 accessions of AYB were collected from the Genetic Resources center of the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria, and subjected to PCR amplification and Sanger sequencing. The dataset determines genetic relatedness among the twenty-four accessions of AYB. The data consist of partial rbcL gene sequences (24), estimates of intra-specific genetic diversity, the maximum likelihood of transition/transversion bias, and evolutionary relationships based on the UPMGA clustering method. The data identified 13 variables (segregating sites) as SNPs, 5 haplotypes, and codon usage of the species that can be explored further to advance the genetic utilization of AYB.


Keywords:
African yam bean Breeding Genetic intra-specific diversity Germplasm conservation Ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene hybridization trials and conservation purposes. A total of 24 accessions of AYB were collected from the Genetic Resources center of the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria, and subjected to PCR amplification and Sanger sequencing. The dataset determines genetic relatedness among the twenty-four accessions of AYB. The data consist of partial rbcL gene sequences (24), estimates of intra-specific genetic diversity, the maximum likelihood of transition/transversion bias, and evolutionary relationships based on the UPMGA clustering method. The data identified 13 variables (segregating sites) as SNPs, 5 haplotypes, and codon usage of the species that can be explored further to advance the genetic utilization of AYB.  Table   Subject Biological Science Specific subject area Plant Science, Genetic diversity, Phylogeny, and Evolution, Bioinformatics Type of data Tables, Figure  How the data were acquired Amplification of the ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds; chloroplast through PCR and DNA Sanger Sequencing. Data were analyzed using Geneious Prime 2022, DnaSP v6.12.03, and MEGAX. Data format Raw Analyzed Description of data collection A total of 25 accessions of AYB seeds were collected from the Genetic Resources center (GRC) of the International Institute of Tropical Agriculture (IITA), Ibadan. The seeds were planted in a replicate of 5 rows of five accession (25 ×2) and maintained for 2 weeks at the Genebank screenhouse. Accession TSs6 did not germinate and was excluded from the analysis. Twenty-four (24) accessions were assessed using rbcL primers and the genetic diversity relatedness parameters such as single nucleotide polymorphism, codon usage, and cluster analysis were estimated using MEGAX, and CodonW whereas the number of polymorphic (segregating sites), the number of haplotypes, haplotype (gene) diversity, the variance of haplotypes, nucleotide diversity and the average of nucleotide differences were calculated using DnaSP 6.0. Data source location The location and passport data are summarized in Table 1 • Accession TSs 303 showed consistency in base substitution (transition and transversion) in all the 13 variable sites and accounted for 42% of the total variable sites. • The sequence data will benefit the scientific community in the area of agriculture and researchers can deploy rbcL gene sequences in the genetic characterization of the species to infer evolutionary history toward the unbiased selection of promising accessions of AYB for breeding and conservation purposes.

Objectives
The dataset aimed to assess the partial rbcL gene sequences intra-specific variability and phylogenetic relationships among some accessions of African yam bean.

Data Description
African yam bean (AYB) ( Sphenostylis stenocarpa (Hochst. ex A. Rich.) Harms.) belong to the Order: Fabale and Family Fabaceae. It is one of the nutritionally, and medicinally important orphan legumes in Central, Western, and Eastern Africa [ 1 , 2 ]. Its low cholesterol and high-quality protein make it a suitable source of food for obese, diabetic, and hypertensive patients [3,4] . AYB is constrained by intra-specific incompatibility, low yields, indeterminate growth pattern and long gestation period, hard-to-cook (HTC) phenomenon, and the presence of antinutritional factors (ANFs). It is imperative to deploy marker-assisted crop improvement strategies and sequencing information to obtain more insights on AYB genetic relatedness. The data in this article presents the rbcL gene sequences of twenty-four accessions of S. stenocarpa . Table 1 shows the passport data and country of origin of the accessions while the phenotypic images of the accessions are presented in Fig. 1 . The rbcl gene structure with the primer binding sites is Table 1 Passport data and country of origin of the AYB accessions used for this study.

Accessions
Country of Origin Seed shape Seed coat color Seed coat texture Brilliance of seeds   shown in Fig. 2 . Table 2 explains the submitted sequences to the NCBI GenBank, the assigned GenBank numbers, matched organisms, and the sequence length (bp). Single nucleotide polymorphic (SNP) of the 13 variable sites are represented in Table 3 while codon usage parameters including codon bias index (CBI), scaled chi-square, and G + C contents of coding and noncoding positions are presented in Table 4 . Table 5 summarizes the codon usage and amino acid residues of the S. stenocarpa . The estimates of genetic diversity and Maximum Likelihood of Transition/Transversion Bias are represented in Tables 6 and 7 , respectively. The evolutionary history of the taxa using the UPGMA method with five sub-clusters is presented in Fig. 3 .    TSs357 n/a 0.9 0.625 n/a 0.5 0.25 0.5 0.538 ENC = Effective number of codons, CBI = Codon bias index, Schi2 = Scaled Chi Square, G + Cn = G + C content at noncoding positions, G + C2 = G + C content at second coding position, G + C3s = G + C content at (synonymous) third coding positions, G + Cc = G + C content at coding region, G + C = G + C content in the genome (whole region), n/ a = not available.    Fig. 3. Evolutionary relationship among the twenty-four accessions of AYB using the rbcL gene marker.

Acquisition of AYB Seeds
A total of 25 accessions of AYB seeds were collected from the Genetic Resources center (GRC) of the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria, while twenty-four accessions were used for this study ( Table 1 ). One of the accessions (TSs6) did not germinate and was excluded from the analysis.

Genomic DNA Extraction
Genomic DNA was extracted from the sample leaves ( Table 1 ) using Zymo Research Quick-DNA Plant/Seed Miniprep Kit (Catalogue No. D6020) following the manufacturer's instructions.

Post-PCR Purification and Sequencing Analysis
PCR products were cleaned using an enzymatic method (ExoSAP) whereas the PCR amplicons were sequenced at Inqaba biotechnical Industries (Pty) Ltd, South Africa using the Nimagen, Brilliant Dye TM Terminator Cycle Sequencing Kit V3.1, BRD3-100/1000 according to manufacturer's instructions.

Data Analysis
Sequences were cleaned and aligned using default settings in Geneious Prime 2022 [5] . Aligned sequences were then imported into MEGA to estimate transition/transversion bias (R). Analyses were conducted using the Maximum Composite Likelihood model including + 2nd + 3rd + Noncoding codon positions. The phylogenetic relationship among the accessions was inferred using the UPGMA method [6] . Genetic diversity indices such as numbers of polymorphic/segregating sites (S), haplotype number (h), haplotype diversity (Hd), nucleotide diversity ( π ), average number of nucleotide differences (k), Single Nucleotide Polymorphic (SNP) position sites and codon bias index were estimated using DnaSP 6.0 [7] . Amino acid residues and codon usage indices were estimated using CodonW as implemented on a public Galaxy server ( https://galaxy.pasteur.fr/ ).

Ethics Statement
The seeds were acquired under the Standard Material Transfer Agreement (SMTA) of the International Treaty on Plant Genetic Resources. No: SMTA-0 0AF05-0 0BO89-201210 collected from the Genetic Resources center (GRC) of the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria. The seeds were cultivated and leaves were harvested under standard conditions specified by the agreement. The experiment in this article does not involve animals.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.