Sequence polymorphism data of the hypervariable regions of mitochondrial DNA in the Yadav population of Haryana

Genetic variations among humans occur both within and among populations and range from single nucleotide changes to multiple-nucleotide variants. These multiple-nucleotide variants are useful for studying the relationships among individuals or various population groups. The study of human genetic variations can help scientists understand how different population groups are biologically related to one another. Sequence analysis of hypervariable regions of human mitochondrial DNA (mtDNA) has been successfully used for the genetic characterization of different population groups for forensic purposes. It is well established that different ethnic or population groups differ significantly in their mtDNA distributions. In the last decade, very little research has been conducted on mtDNA variations in the Indian population, although such data would be useful for elucidating the history of human population expansion across the world. Moreover, forensic studies on mtDNA variations in the Indian subcontinent are also scarce, particularly in the northern part of India. In this report, variations in the hypervariable regions of mtDNA were analyzed in the Yadav population of Haryana. Different molecular diversity indices were computed. Further, the obtained haplotypes were classified into different haplogroups and the phylogenetic relationship between different haplogroups was inferred.


a b s t r a c t
Genetic variations among humans occur both within and among populations and range from single nucleotide changes to multiplenucleotide variants. These multiple-nucleotide variants are useful for studying the relationships among individuals or various population groups. The study of human genetic variations can help scientists understand how different population groups are biologically related to one another. Sequence analysis of hypervariable regions of human mitochondrial DNA (mtDNA) has been successfully used for the genetic characterization of different population groups for forensic purposes. It is well established that different ethnic or population groups differ significantly in their mtDNA distributions. In the last decade, very little research has been conducted on mtDNA variations in the Indian population, although such data would be useful for elucidating the history of human population expansion across the world. Moreover, forensic studies on mtDNA variations in the Indian subcontinent are also scarce, particularly in the northern part of India. In this report, variations in the hypervariable regions of mtDNA were analyzed in the Yadav population of Haryana. Different molecular diversity indices were computed. Further, the obtained haplotypes were classified into The data is available with this article

Value of the data
The present data is highly useful for the identification of individuals hypervariable involved in mass disasters, missing person cases and criminal cases in the Yadav population of Haryana.
This data will help assess matches in mtDNA sequences in forensic casework in Haryana, and will be useful for population analyses based on specific sequence polymorphisms in the Yadav population of Haryana.
The data report will provide baseline information for genetic studies based on the control region of mtDNA for tracking families related to the Yadav population of Haryana.
This report is important for anthropological and evolutionary research, as well as for phylogenetic studies on the Yadav population of Haryana.
This report could also be used by evolutionary biologists to study genetic variations in order to understand the possible relationships of the Yadav population with other populations.
The data presented here can be used as reference material for future genetic studies on the Yadav population of Haryana.
The mtDNA haplogroups generated in this data report can be used for tracing the migration and ancestry of the Yadav population of Haryana.
The present data will contribute to the DNA database for the Yadav population of Haryana, which can be used for calculating the probability of matches based on mtDNA. Nucleotide position, primer name, sequence, length and melting temperature of the primers are mentioned.

Blood samples and DNA extraction
Blood samples were collected from 66 maternally unrelated individuals belonging to the Yadav population of Haryana from nearly all districts of Haryana. A sample of 2-5 ml of venous blood was drawn into 5 ml EDTA vacutainer tubes (Greiner Bio-One, USA). A consent form was signed by each participating individual at the site of collection. DNA was extracted using the phenol-chloroform-isoamyl (PCI) method [1].

PCR amplification
The three hypervariable regions were amplified by using two sets of PCR reactions. The primers F15900 and R00159 were used to amplify the HVI region. The primers F00015 and R00599 were used to amplify the HVII and HVIII regions [2] (Table 1). They were synthesized at Integrated DNA Technologies (IDT, USA). The PCR reaction was carried out in a final volume of 25 µl (Table 2). PCR was performed on SureCycler 8800 (Agilent Technologies, USA) ( Table 3). Positive and negative controls were also used to ensure that no contamination was present at any stage during the experiments. The amplified PCR product was visualized using 1.6% agarose gel in the Gel Documentation System (Alpha Innotech). The amplified PCR product was cleaned with a GeneJET PCR Purification Kit (Thermo Fisher, USA) according to the manufacturer's guidelines to remove any impurities present in the template.

Sequencing
The cleaned PCR product was sequenced by commercial DNA sequencing service (Xcelris Labs Limited, Ahmedabad, India). Both the strands were sequenced with the ABI BigDye Terminator Cycle Sequencing Kit on the ABI 3700 Genetic Analyzer (Applied Biosystems). All the samples were sequenced with the same primers used in PCR amplification of the HVI, HVII and HVIII regions. An additional primer (16410R-GAGGATGGTGGTGGTCAA) was used in cases where there was slippage due to 'C' stretch in hyper variable region 1.

Statistical analysis
A total of 66 mtDNA sequences was obtained, which included all the base pairs from nucleotide positions 16021−16365, 69−576 bp. Sequence data files were edited and aligned with the revised Cambridge reference sequences (rCRS) [3] by using the Mega 7 software [4]. Manual alignment was also done to cross check the results by creating data analysis sheets. The interpretation was done as per the guidelines [5][6][7]. The coding for heteroplasmic sites was done according to the IUPAC codes in the interpretation guidelines to interpret the mtDNA data analysis results [8]. Any observed C-stretch length heteroplasmy in the HVI, HVII and HVIII region sequences was excluded from statistical analysis. Statistical analysis was first performed for each hypervariable segment separately, and then combined for the HVIþHVII þHVIII regions. Gene diversity was calculated according to Tajima [9]. Population pairwise differences were determined based on genetic distances [10]. Haplotype diversity, mean pairwise differences, nucleotide diversity, Harpending's raggedness index, mismatch distributions, Fu's Fs and Tajima's D test statistics were calculated using the Arlequin software, version 3.5.1.2 [11] (Table 4). A Random match probability (RMP) was calculated according to Stoneking et al. [12] (Table 5). Haplogroup classification was performed using the HaploGrep 2 software [13] (Table 6). A phylogenetic tree of all the haplogroups was constructed using HaploGrep 2, while all the classified samples were combined to produce a resulting (rooted) tree, which included all the related polymorphisms relative to the rCRS (Fig. 1). GenBank accession numbers for the mtDNA polymorphisms identified in the Yadav population are provided in Table S1 (Supplementary table).