rbcL gene dataset on intra-specific genetic variability and phylogenetic relationship of Crassocephalum crepidioides (Benth) S. Moore. (Asteraceae) in Nigeria

Crassocephalum crepidioides (Benth) S. Moore (Asteraceae) commonly called “thickhead” is underutilised, indigenous to the rainforest of West and Central Africa but has also been introduced and naturalised throughout tropical and sub-tropical Asia, Australia, Tonga and Samoa. The species is an important medicinal and leafy vegetable endemic to the South-western region of Nigeria. Its cultivation, utilisation and local knowledge base could be stronger than the mainstream vegetables. Genetic diversity is uninvestigated for breeding and conservation purposes. The dataset consists of partial rbcL gene sequences, amino acid profiles and nucleotide compositions for 22 accessions of C. crepidioides. The dataset provides information on the species distribution (Nigeria), genetic diversity and evolution. The sequence information is integral for developing specific DNA markers for breeding and conservation purposes.


a b s t r a c t
Crassocephalum crepidioides (Benth) S. Moore (Asteraceae) commonly called "thickhead" is underutilised, indigenous to the rainforest of West and Central Africa but has also been introduced and naturalised throughout tropical and sub-tropical Asia, Australia, Tonga and Samoa. The species is an important medicinal and leafy vegetable endemic to the South-western region of Nigeria. Its cultivation, utilisation and local knowledge base could be stronger than the mainstream vegetables. Genetic diversity is uninvestigated for breeding and conservation purposes. The dataset consists of partial rbcL gene sequences, amino acid profiles and nucleotide compositions for 22 accessions of C. crepidioides. The dataset provides information on the species distribution (Nigeria), genetic diversity and evolution. The sequence information is integral for developing specific DNA markers for breeding and conservation purposes.  Table   Subject Biological Sciences Specific subject area Agricultural, Genetic diversity, Phylogenetics, Evolution. Type of data Tables, Figure. How data were acquired PCR amplification and DNA Sanger sequencing for rbcL gene amplification.

Data format
Raw, Analyzed. Description of data collection Leaf samples of C. crepidioides accessions were collected from species endemic areas of Southwest Nigeria, Silica gel dried and preserved under -80 °C ( Table 1). The population diversity, nucleotide, and amino acid contents of each accession were determined using DnaSP 6.0, and all accessions were assessed using rbcL primers. Phylogenetic tree was constructed using Mega 11.0.13, CodonW was used to estimate the codon use indices and Arlequin 3.5.2.2 used to estimate the AMOVA for the population. Data source location The data locations are summarised in Table 1

Value of the Data
This dataset is generated for the partial rbcL gene sequences and is important for: • The identification of areas of the abundance of C. crepidioides within the study areas is essential for the breeding and germplasm conservation of the species. • Information on the nucleotide polymorphism, amino acid composition, and Codon usage of C. crepidioides accessions collected across South-Western Nigeria. • The analysis of genetic diversity, molecular phylogeny, evolution and sub-speciation of C.
crepidioides . • Information on Codon bias usage, vital for ecological adaptation analysis of the species and the Amino acid profile, essential for the nutritional value and sub-speciation delimitation in the species.

Objectives of the Study
The objectives were to provide the distribution, intra-specific genetic diversity and phylogenetic relationship of C. crepidioides collected across the species endemic areas in Nigeria using rbcL gene sequence.

Data Description
The present work employs Sanger sequencing for generating sequences of the chloroplast gene -rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit) from 22 accessions of C. crepidioides accessions deposited on NBCI GenBank. Fig. 1 shows the collection map for the C. crepidioides accessions across five South-western states in Nigeria. Plate 1 ; shows the Crassocephalum crepidioides plant growing in its natural habitat with floral display. Table 1 details the field collection information and the NCBI Genbank accession number for the sequences deposited. Table 2 ; presents the length of the sequences, % GC and nucleotide content of the 22 C. crepidioides sequences. Table 3 ; records the intra-genetic diversity of the C. crepidioides study population. Table 4 ; presents the genetic diversity information of the species in the study area based on the rbcL sequence analysis. Table 5 ; represents the codon usage profile based on C. crepidioides sequences. Table 6 ; highlights the Codon usage indices for each of the 22 C. crepidioides sequences. Fig. 2 ; shows the Phylogenetic tree construct using the C. crepidioides sequences data. Table 7 : records the amino acid molecular weight profile for each of the 22 C. crepidioides sequences. Table 8 ; presents the AMOVA analysis for variance within and between the 5 C. crepidioides population.

Plant Material
Twenty-two (22) C. crepidioides accessions were collected from 5 South-Western states in Nigeria, including Oyo, Ekiti, Ogun, Osun and Kwara ( Table 1 ). The plant is indigenous to tropical Africa, particularly in West and Central Africa as a vegetable [2 , 3] . The leaves samples were carefully cleaned, silica-gel dried, labelled and assigned accession numbers [4] . Herbarium sam-

Genomic DNA Extraction
Genomic DNA was extracted using the CTAB protocol [5] , and quality and quantity were authenticated using the ThermoFischer® Nanodrop spectrophotometer ND-80 0 0-GL  Table 3 Sample

DNA Sequencing and Gene Amplification
The ribulose 1,5 bisphosphate carboxylase/oxygenase large subunit ( rbcL ) gene which served as a reference, was used to design the forward and reverse primers. The sequence of the forward primer (H1f) is 5-ATGTCACCACAAACAGAAAC-3 (T m = 56 °C); and the reverse primer (Fofana R): 5-GTAAAATCAAGTCCACCGCG-3 (T m = 56 °C) [6 , 7] . The PCR amplicon was sequenced at the Bioscience Laboratory, International Institute of Tropical Agriculture (IITA) Ibadan, Nigeria using the ABI 3130x genetic analyser (Applied Biosystems)

Data Analysis
The Sanger sequences generated were aligned using ClustalW on BioEdit (ver. 7.2.5) with default settings to create a consensus sequence [8] . Sequences were submitted on NCBI Gen-Bank with accession numbers in Table 1 . Population diversity indices such as numbers of segregating sites (S), haplotype number (h), haplotype diversity (Hd), and nucleotide diversity ( π ) and the codon usage frequency table of C. crepidioides were estimated using DnaSP 6.0 [9] . The nucleotide composition, amino acid compositions and phylogenetic analyses were conducted in MEGA v11.0.13 [10] . Codon usage indices were calculated using CodonW as implemented on a public Galaxy server ( https://galaxy.pasteur.fr/ ). Arlequin 3.5.2.2 was used to estimate the Analysis of Molecular Variance (AMOVA) within the populations [11] .  2. Phylogenetic tree construction using the 22 C. crepidioides rbcL gene sequence information. The low bootstrap reading might be due to high similarity among the sequences.

Table 8
Population differentiation (AMOVA) of the C. crepidioides population studied using the rbcL sequences.

Ethics Statements
The field data presented in Table 1 were obtained via open field collection visits and did not require informed consent. No part of the data was obtained from any Social Media platform.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
RbcL Sequence Diversity in Crassocephalum crepidioides (Benth) S. Moore from Five Selected States in Nigeria (Original data) (GenBank NCBI).