Dataset on 21 autosomal and two sex determining short tandem repeat loci in the Kedayan population in Borneo, Malaysia

This data article provides population frequencies for 21 autosomal and two sex determining short tandem repeat (STR) loci in unrelated Kedayan individuals. This article is related to the research paper entitled “Forensic parameters and ancestral fraction in the Kedayan population inferred using 21 autosomal STR loci” [1] where these same data were subjected to ancestry and forensic analyses. We have collected 200 blood samples consisting of 128 male and 72 female volunteer representatives from Kedayan people residing in various parts of Borneo. All 23 STR loci were simultaneously amplified using Globalfiler™ Express PCR and amplicons were separated using an ABI 3500xl Genetic Analyzer. The STR allele calls at each locus were called using GeneMapperⓇ ID-X Software v1.4, while several algorithms in Arlequin software version 3.5 were used to estimate Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) between pairs of STR loci.

separated using an ABI 3500xl Genetic Analyzer. The STR allele calls at each locus were called using GeneMapper R ID-X Software v1.4, while several algorithms in Arlequin software version 3.5 were used to estimate Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) between pairs of STR loci.
© 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license.
( http://creativecommons.org/licenses/by/4.0/ ) Specifications Value of the data • The 23 autosomal STR datasets deposited here are the first reported for the Kedayan [2][3][4] • The STR dataset for Kedayan can be used as a reference population in future genetic studies of other ethnic groups in Sabah and Sarawak [5][6][7][8] . • The forensic parameters computed for Kedayan can be used as a standard to properly weight DNA profiles for this ethnic group and their genetically related population groups [9][10][11] . • For comparison and validation purposes, the raw STR genotype data Kedayan can be reanalyzed using other algorithms or software [12][13] .

Data description
In this article we provide a detailed autosomal STR data for Kedayan individuals in Sabah and Sarawak, Malaysia as reported in Hakim et al. [1] . The STR data were obtained by genotyping of 21 autosomal STR loci (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, VWA, TPOX, D18S51, D5S818, FGA, D12S391, D1S1656, D2S441, D10S1248, D22S1045, SE33 and two sex determining STR loci (Amelogenin and DYS391) in two hundred blood samples of unrelated, un-admixed and healthy individual volunteer representatives (128 males and 72 females) of the Kedayan people. Their STR profiles are shown in Table 1a and  Table 1b while Hardy Weinberg equilibrium (HWE) and linkage disequilibrium (LD) values between pairs of 21 autosomal STR loci estimated using Arlequin software version 3.5 are shown in Table 2 and Table 3 .

Ethics statement
Prior to this research, an ethics permit (cert no: USM/JEPeM/15,100,366) was obtained from the Human Ethical Committee, Universiti Sains Malaysia.

Sample collection
Two hundred blood samples were collected from unrelated healthy representatives (128 males and 72 females) of the Kedayan. Sampling locations were Sipitang and Labuan Island of Sabah and Lawas in Sarawak. Participants were interviewed and their written informed consent was obtained before blood sampling. Sampling criteria include three generations of un-admixed history with other ethnicities and that all participants should have no history of illness and are unrelated to each other. Other detailed descriptions of materials, methods, experimental work and data analyses are given elsewhere in Ref [1] .

Cell Lysis and STR genotyping
Collected blood samples on FTA cards (Whatman, UK) were punched and treated with 3 μl of Prep-n-Go TM Lysis Buffer (Thermo Fisher Scientific, Inc., Waltham, MA, USA). The lysates were then mixed with reaction mixture contain in Globalfiler TM Express PCR Amplification kit (Thermo Fisher Scientific, Inc., Waltham, MA, USA) and amplification was carried out on a GeneAmp R PCR System 9700 Thermal Cycler (Life Technologies, Foster City, CA). The thermal cycling parameters were as followed; initial denaturation at 95 °C for 1 min; 27 cycles at 94 °C for 3 s and 60 °C for 30 s, followed by final extension at 60 °C for 8 min. As a quality assurance, Control DNA 007 included in the commercial kit (Thermo Fisher Scientific, Inc., Waltham, MA, USA) was also added in each PCR reaction set-up. Post PCR procedure involved separation of STR specific amplicons, allelic ladder and internal Size Standard dye of 600 LIZ TM v2 (Thermo Fisher Scientific, Inc., Waltham, MA, USA) using capillary electrophoresis in an ABI 3500xl Genetic Analyzer (Thermo Fisher Scientific, Inc., Waltham, MA, USA), according to manufacturer's guidelines. GeneMapper R ID-X software version 1.4 (Thermo Fisher Scientific, Inc., Waltham, MA, USA) was used for determining STR allele calls.

Data analysis
HWE and LD estimations were carried out using Arlequin software version 3.5 and both analyses were considered significant at the p-value < 0.05 [14] . The p -value for HWE tests was then adjusted to < 0.002 using Bonferroni correction [15] . This p-value < 0.002 was obtained by dividing the standard p-value (0.05) with total number of tested loci (i.e. 21 locus). Similarly, standard significant level for deviation from LD between pair of STR loci ( < 0.05) was also adjusted to p < 0.0 0 02 (0.05/231, where 231 is the total combinations of STR loci) using Bonferroni correction.