Data on haplotype diversity in the hypervariable region I, II and III of mtDNA amongst the Brahmin population of Haryana

Human mitochondrial DNA (mtDNA) is routinely analysed for pathogenic mutations, evolutionary studies, estimation of time of divergence within or between species, phylogenetic studies and identification of degraded remains. The data on various regions of human mtDNA has added enormously to the knowledge pool of population genetics as well as forensic genetics. The displacement-loop (D-loop) in the control region of mtDNA is rated as the most rapidly evolving part, due to the presence of variations in this region. The control region consists of three hypervariable regions. These hypervariable regions (HVI, HVII and HVIII) tend to mutate 5–10 times faster than nuclear DNA. The high mutation rate of these hypervariable regions is used in population genetic studies and human identity testing. In the present data, potentially informative hypervariable regions of mitochondrial DNA (mtDNA) i.e. HVI (np 16024–16365), HVII (np 73–340) and HVIII (np 438–576) were estimated to understand the genetic diversity amongst Brahmin population of Haryana. Blood samples had been collected from maternally unrelated individuals from the different districts of Haryana. An array of parameters comprising of polymorphic sites, transitions, transversions, deletions, gene diversity, nucleotide diversity, pairwise differences, Tajima's D test, Fu's Fs test, mismatch observed variance and expected heterozygosity were estimated. The observed polymorphisms with their respective haplogroups in comparison to rCRS were assigned.


a b s t r a c t
Human mitochondrial DNA (mtDNA) is routinely analysed for pathogenic mutations, evolutionary studies, estimation of time of divergence within or between species, phylogenetic studies and identification of degraded remains. The data on various regions of human mtDNA has added enormously to the knowledge pool of population genetics as well as forensic genetics. The displacementloop (D-loop) in the control region of mtDNA is rated as the most rapidly evolving part, due to the presence of variations in this region. The control region consists of three hypervariable regions. These hypervariable regions (HVI, HVII and HVIII) tend to mutate 5-10 times faster than nuclear DNA. The high mutation rate of these hypervariable regions is used in population genetic studies and human identity testing. In the present data, potentially informative hypervariable regions of mitochondrial DNA (mtDNA) i.e. HVI (np 16024-16365), HVII (np 73-340) and HVIII (np 438-576) were estimated to understand the genetic diversity amongst Brahmin population of Haryana. Blood samples had been collected from maternally unrelated individuals from the different districts of Haryana. An array of parameters comprising of polymorphic sites, transitions, transversions, deletions, gene diversity, nucleotide diversity, pairwise differences, Tajima

Value of the data
The data report will provide baseline information to any future evolutionary and genetic studies based on control region of mtDNA of the Brahmin population of Haryana.
The data produced may be helpful in finding new mutations or polymorphisms which will prove quite useful in personal identification.
It is also of special relevance to the investigative agencies in particular and society in general. In cases of mass disasters, it is common for the government agencies to provide compensations to the deceased family members. So the present data can also aid in the identification process of such cases.
The present data will enhance the DNA database of Brahmin population, which can be used for calculating probabilities of match based on mtDNA. Table 1 describes the primer pairs used for amplifications of extracted samples. Table 2 describes the PCR reaction mixture used for amplification of HVI, HVII and HVIII region. Table 3 summarizes the PCR cycling conditions adopted during experiment. Table 4 summarizes the molecular diversity as seen in the HVI region. Table 5 summarizes the molecular diversity as seen in the HVII and HVIII region. Table 6 summarizes the molecular diversity as seen in the HVI þ HVII þ HVIII region. Table 7 summarizes the frequency distribution of mtDNA haplotypes in Brahmin population. Table 8 summarizes the sequence polymorphism and their respective haplogroups in the Brahmin population. Table 9 GenBank accession numbers for Brahmin Population (Supplementary Table).

Sample collection and genomic DNA extraction
The present study was completed in different phases. The first phase comprised of blood samples collection followed by the second phase, which involved the molecular biology procedures for DNA extraction, PCR amplification, PCR clean-up and sequencing. The last phase consisted of statistical analysis and interpretation of the data generated.
Blood samples had been collected from maternally unrelated individuals from nearly all districts of Haryana belonging to the ethnic group of Brahmins, after following proper ethical guidelines. Total genomic DNA was extracted from the samples using the Phenol-Chloroform method [1]. The extracted DNA was checked for its quality on 0.8% agarose gel and quantity was checked on the Nanodrop (Thermo scientific, USA).

PCR amplification
The three hypervariable regions, i.e. the HVI lying between np16024 and 16365, HVII lying between np 73-340 and HVIII lying between np 438-576 were amplified using both forward and reverse primers. The primer pairs used by the Brandstatter et al. [5] were used for amplifications. They were synthesized at IDT (Integrated DNA Technologies (IDT), USA) ( Table 1). Two sets of PCR reactions were used for each sample, i.e. one for HVI region alone and the other amplified both the HVII and HVIII regions together. All controls, i.e., control and − ve extraction controls and amplification controls along with a reagent blank control were used during the experiments. Controls were used to ensure that no contamination was present at any stage during all the experiments. The PCR reaction was carried out in a final volume of 25 µl given in Table 2. PCR was performed on (SureCycler 8800, Agilent Technologies, USA). The PCR cycling conditions used are given in Table 3. After PCR amplification, the amplified product was visualized on 1.6% agarose (Sisco Research Laboratory, India) gel. GeneRuler 100 bp ladder (Thermo scientific, USA) was used for reading the size of the amplified  product. After electrophoresis, the gel was visualized under Gel Documentation System (Alpha Innotech).

Post PCR cleanup
All the samples were cleaned before sequencing by using Post PCR Clean-up Kits (Thermo Scientific, USA) to remove the PCR inhibitors, primer-dimer formation and impurities present in the template.

Sequencing
The sequencing was carried out in the Xcelaris Genomic Labs by using the ABI BigDye Terminator Cycle Sequencing Kit on ABI 3700 Genetic analyzer (Applied Biosystems). All the samples were sequenced with the same primers used in PCR amplification for HVI, HVII & HVIII regions. An additional primer (16410R-GAGGATGGTGGTGGTCAA) has also been used for hyper variable region I in samples where slippage due to 'C' stretch was observed.

Statistical analysis
The interpretation of the HVI, HVII and HVIII chromatogram was done as per the guidelines to improve the quality of the data [6][7][8][9]. The sequences were matched and aligned with the revised Cambridge reference sequences (rCRS) [10] by using Mega 7 [3]. The coding for heteroplasmic sites was done according to the IUPAC codes in the interpretation guideline to interpret the mtDNA data analysis [8]. Diversity indices and differentiation tests were computed. The gene diversity was calculated according to Tajima [11]. Population pairwise differences were calculated by using genetic distances [12]. Nucleotide diversity, haplotype diversity, mean pairwise difference, number of haplotypes, mismatch distributions, Harpending's raggedness index, Tajima's D test and Fu's Fs statistics were calculated by using Arlequin software version 3.5.1.2 [2] as shown in Tables 4 and 5. A random match probability (RMP) was calculated according to Stoneking et al. [13] (Table 6). Haplogroups classification and phylogenetic tree was performed by using HaploGrep 2 software [4] as shown in Table 8 and Fig. 1.