Sequence polymorphism and haplogroup data of the hypervariable regions on mtDNA in Semoq Beri population

Orang Asli is the aboriginal people in Peninsular Malaysia who have been recognized as indigenous to the country and still practicing traditional lifestyle. The molecular interest on the Orang Asli started when the earliest prehistoric migration occurred approximately 200 kya and entering Peninsular Malaysia 50 kya in stages. A total of three groups of Orang Asli present in Peninsular Malaysia, namely, Negrito also known as Semang, Senoi and Proto Malays. Through records, there is no research has been conducted on mtDNA variations in the Semoq Beri population, one of the tribes in Senoi group. In this report, variations of mtDNA were analysed in the population in Hulu Terengganu as an initial effort to establish the genetic characterisation and elucidating the history of Orang Asli expansion in Peninsular Malaysia. An array of mtDNA parameters was estimated and the observed polymorphisms with their respective haplogroups in comparison to rCRS were inferred respectively. The DNA sequences are registered in the NCBI with accession numbers KY853670-KY853753.


a b s t r a c t
Orang Asli is the aboriginal people in Peninsular Malaysia who have been recognized as indigenous to the country and still practicing traditional lifestyle. The molecular interest on the Orang Asli started when the earliest prehistoric migration occurred approximately 200 kya and entering Peninsular Malaysia 50 kya in stages. A total of three groups of Orang Asli present in Peninsular Malaysia, namely, Negrito also known as Semang, Senoi and Proto Malays. Through records, there is no research has been conducted on mtDNA variations in the Semoq Beri population, one of the tribes in Senoi group. In this report, variations of mtDNA were analysed in the population in Hulu Terengganu as an initial effort to establish the genetic characterisation and elucidating the history of Orang Asli expansion in Peninsular Malaysia. An array of mtDNA parameters was estimated and the observed The mtDNA sequences are registered in the NCBI with accession number KY853670-KY853753 [Table S1] Related research article Zahidin [3] Value of the data Presently, there are 533 Semoq Beri and likely to be a threatened population in Hulu Terengganu due to the culture assimilation and intermarriage [3][4][5][6].
The data provide baseline information to any future genetic and evolutionary studies as inferred from control region mtDNA.
The data will enhance the DNA database of Semoq Beri population to elucidating the history of Orang Asli expansion in Peninsular Malaysia.
The data allow other researchers focusing on this population to start genome-wide analysis.

Sample collection and genomic DNA extraction
All sequence data were generated from DNA samples that were collected with informed and written consent, and approved by Universiti Sultan Zainal Abidin (UniSZA) Human Research Ethics Committee, Malaysia. Blood samples were collected from unrelated individuals of Semoq Beri in Kampung Sungai Berua, Hulu Terengganu, Malaysia. The blood samples were extracted using PureLink ™ Genomic DNA Mini Kit (Invitrogen, USA) following protocol provided by the manufacturer.
N -deletion base, ns -total number of sequences, n -total number of unbroken bases C series.

PCR amplification, DNA purification and sequencing
The isolated genomic DNA were amplified using a set of partial forward and reverse HVI and HVII primers respectively (Table 1) [7]. Negative, amplification and reagent blank controls were used to avoid contamination present at any stage during laboratory works. The PCR amplification was carried out in a final volume of 25 μl (Table S2) in Arktik Thermal Cycler (Thermo Scientific, USA) and the PCR profile was given in Table S3. The amplified PCR products were purified using QIAquick Purification Kit (QIAGEN Ag., Germany). The DNA products were visualized using 1% of agarose gel electrophoresis to read the size of the amplified product. The sequencing was carried out at First Base Laboratories Sdn Bhd (Malaysia) using ABI PRISM s 377 DNA Sequencher with the BigDye s Terminator 3.0 Cycle Sequencing Kit.

Statistical sequence analyses
The fluorescence nucleotide bases of segmented DNA sequences were visualized and read using Sequencher 5.4 (https://genecodes.com). The sequences were matched and aligned with the revised Cambridge Reference Sequences (rCRS) [8,9] using ClustalW2 MUSCLE (Multiple Sequence Comparison by Log-Expectation) (https://www.ebi.ac.uk). The C-stretch for each sequence was checked and counted ( Table 2). The nucleotide composition was performed in MEGA 7 [1] (Table 3). The Arlequin haplotype data were generated using DnaSP 5.1 [2] (Table 4). Haplogroup classification was performed using haplogroup online software (https://dna.jameslick.com) where the haplogroup data were compatible with PhyloTree Build 17 [10]. The schematic diagrams were drawn based on [10] and [11] (Figs. 1 and 2). GenBank accession numbers and haplogroups identification for HVI and HVII of Semoq Beri population are provided in Table S1.

Transparency document. Supporting information
Transparency data associated with this article can be found in the online version at https://doi.org/ 10.1016/j.dib.2018.10.158.